Rapid and Quantitative Proteome Analysis and Related Methods

ABSTRACT

The invention provides methods for identifying polypeptides. The method can include the steps of simultaneously determining the mass of a subset of parent polypeptides from a population of polypeptides and the mass of fragments of the subset of parent polypeptides; comparing the determined masses to an annotated polypeptide index; and identifying one or more polypeptides of the annotated polypeptide index having the determined masses. The method can further include the steps of determining one or more additional characteristics associated with one or more of the parent polypeptides; comparing the determined characteristics to the annotated polypeptide index; and optionally repeating the steps one or more times, wherein a set of characteristics is determined that identifies a parent polypeptide as a single polypeptide in the annotated polypeptide index. The method can additionally include the step of quantitating the amount of the identified polypeptide in a sample containing the polypeptide.

BACKGROUND OF THE INVENTION

This invention relates generally to proteome analysis and, morespecifically, to methods of identifying and/or quantifying a protein orproteins that is contained in a mixture of proteins.

The classical biochemical approach to study biological processes hasbeen based on the purification to homogeneity by sequentialfractionation and assay cycles of the specific activities thatconstitute a process, the detailed structural, functional and regulatoryanalysis of each isolated component, and the reconstitution of theprocess from the isolated components. The Human Genome Project and othergenome sequencing programs are turning out in rapid succession thecomplete genome sequences of specific species and, thus, in principlethe amino acid sequence of every protein potentially encoded by thatspecies. It is to be expected that this information resourceunprecedented in the history of biology will enhance traditionalresearch methods and catalyze progress in fundamentally differentresearch paradigms, one of which is Proteomics.

Efforts to sequence the entire human genome along with the genomes of anumber of other species have been extraordinarily successful. Thegenomes of 46 microbial species (TIGR Microbial Database; www.tigr.org)have been completed and the genomes of over one hundred twenty othermicrobial species are in the process of being sequenced. Additionally,the more complex genomes of eukaryotes, in particular those of thegenetically well characterized unicellular organism Saccharomycescerevisiae and the multicellular species Caenorhabditis elegans andDrosophila melanogaster have been sequenced completely. Furthermore,“draft sequence” of the rice genome has been published, and completionof the human and Arabidopsis genomes are imminent. Even in the absenceof complete genomic sequences, rich DNA sequence databases have beenmade publicly available, including those containing over 2.1 millionhuman and over 1.2 million murine expressed sequence tags (ESTs).

ESTs are stretches of approximately 300 to 500 contiguous nucleotidesrepresenting partial gene sequences that are being generated bysystematic single pass sequencing of the clones in cDNA libraries. Onthe timescale of most biological processes, with the notable exceptionof evolution, the genomic DNA sequence can be viewed as static, and agenomic sequence database therefore represents an information resourceakin to a library. Intensive efforts are underway to assign “function”to individual sequences in sequence databases. This is attempted by thecomputational analysis of linear sequence motifs or higher orderstructural motifs that indicate a statistically significant similarityof a sequence to a family of sequences with known function, or by othermeans such as comparison of homologous protein functions across species.Other methods have also been used to determine function of individualsequences, including experimental methods such as gene knockouts andsuppression of gene expression using antisense nucleotide technology,which can be time consuming and in some cases still insufficient toallow assignment of a biological function to a polypeptide encoded bythe sequence.

The proteome has been defined as the protein complement expressed by agenome. This somewhat restrictive definition implies a static nature ofthe proteome. In reality the proteome is highly dynamic since the typesof expressed proteins, their abundance, state of modification, andsubcellular locations are dependent on the physiological state of thecell or tissue. Therefore, the proteome can reflect a cellular state orthe external conditions encountered by a cell, and proteome analysis canbe viewed as a genome-wide assay to differentiate and study cellularstates and to determine the molecular mechanisms that control them.Considering that the proteome of a differentiated cell is estimated toconsist of thousands to tens of thousands of different types ofproteins, with an estimated dynamic range of expression of at least 5orders of magnitude, the prospects for proteome analysis appeardaunting. However, the availability of DNA databases listing thesequence of every potentially expressed protein combined with rapidadvances in technologies capable of identifying the proteins that areactually expressed now make proteomics a realistic proposition. Massspectrometry is one of the essential legs on which current proteomicstechnology stands.

Quantitative proteomics is the systematic analysis of all proteinsexpressed by a cell or tissue with respect to their quantity andidentity. The proteins expressed in a cell, tissue, biological fluid orprotein complex at a given time precisely defines the state of the cellor tissue at that time. The quantitative and qualitative differencesbetween protein profiles of the same cell type in different states canbe used to understand the transitions between respective states.Traditionally, proteome analysis was performed using a combination ofhigh resolution gel electrophoresis, in particular two-dimensional gelelectrophoresis, to separate proteins and mass spectrometry to identifyproteins. This approach is sequential and tedious, but more importantlyis fundamentally limited in that biologically important classes ofproteins are essentially undetectable.

Thus, there exists a need for rapid, efficient, and cost effectivemethods proteome analysis. The present invention satisfies this need andprovides related advantages as well.

SUMMARY OF THE INVENTION

The invention provides methods for identifying polypeptides. The methodcan include the steps of simultaneously determining the mass of a subsetof parent polypeptides from a population of polypeptides and the mass offragments of the subset of parent polypeptides; comparing the determinedmasses to an annotated polypeptide index; and identifying one or morepolypeptides of the annotated polypeptide index having the determinedmasses. The method can further include the steps of determining one ormore additional characteristics associated with one or more of theparent polypeptides; comparing the determined characteristics to theannotated polypeptide index; and optionally repeating the steps one ormore times, wherein a set of characteristics is determined thatidentifies a parent polypeptide as a single polypeptide in the annotatedpolypeptide index. The method can additionally include the step ofquantitating the amount of the identified polypeptide in a samplecontaining the polypeptide.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic diagram of a protein identification strategybased on mass spectrometry (MS) and tandem mass spectrometry (MS/MS)measurements.

FIGS. 2A and 2B show two different methods to generate fragment ionselected peptide ions that are diagnostic for the identification of theparent ion. FIG. 2A shows the selection of a parent ion (in Q1), whichis fragmented in a collision cell (Q2). A mass spectrum of the fragmentsis determined (Q3). FIG. 2B shows that, instead of selecting a singleparent ion, multiple parent ions (indicated in “Source region”) areconcurrently fragmented in the post-ionization region or collision cell.The fragment ions are then analyzed in a Q1 or other mass analyzer,resulting in a mass spectrum consisting of fragment ions from multipleparent ions.

FIG. 3 shows the steps of a method for comparing and quantitating twopolypeptide populations using a polypeptide identification index ofAnnotated Peptide Tags.

FIGS. 4A, 45 and 4C show identification of a polypeptide using massspectrometry (MS). FIGS. 4A and 4B show mass spectra of two polypeptides(P1 and P2) obtained using ESI-TOF. Spectra were acquired at lowV_(nozzle-skimmer) (10v) (FIG. 4A) and high V_(Nozzle-Skimmer) (240V)(FIG. 4B). FIG. 4C shows a list of 13 and 12 possible polypeptideidentifications for P1 and P2, respectively.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides methods for identifying a polypeptide from apopulation of polypeptides by determining characteristics associatedwith a polypeptide, or a peptide fragment thereof, comparing thedetermined characteristics to a polypeptide identification index, andidentifying one or more polypeptides in the polypeptide identificationindex having the same characteristics. The methods of the invention areapplicable to proteome analysis and allow rapid and efficientidentification of one or more polypeptides in a complex sample. Themethods are based on generating a polypeptide identification index,which is a database of characteristics associated with a polypeptide.The polypeptide identification index can be used for comparison ofcharacteristics determined to be associated with a polypeptide from asample for identification of the polypeptide. Furthermore, the methodscan be applied not only to identify a polypeptide but also to quantitatethe amount of specific proteins in the sample.

The methods of the invention for identifying a polypeptide areapplicable to performing quantitative proteome analysis, or comparisonsbetween polypeptide populations that involve both the identification andquantitation of sample polypeptides. Such a quantitative analysis can beconveniently performed in two separate stages, if desired. As a firststep, a reference polypeptide index can be generated representative ofthe samples to be tested, for example, from a species, cell type ortissue type under investigation, as described herein. The second step isthe comparison of characteristics associated with an unknown polypeptidewith the reference polypeptide index or indices previously generated. Areference polypeptide index is a database of polypeptide identificationcodes representing the polypeptides of a particular sample, such as acell, subcellular fraction, tissue, organ or organism. A polypeptideidentification index can be generated that is representative of anynumber of polypeptides in a sample, including essentially all of thepolypeptides potentially expressed in a sample. Accordingly, the methodsof the invention advantageously allow the determination of polypeptidesin a sample that correlates with or defines a particular physiologicalstate of the sample, for example, a disease state. Moreover, once apolypeptide identification index has been generated, the index can beused repeatedly to identify one or more polypeptides in a sample, forexample, a sample from an individual potentially having a disease.

For quantitation of a polypeptide in a sample, a polypeptide is comparedto a chemically identical molecule that is isotopically labeled, forexample, with ¹³C for ¹²C, deuterium for hydrogen, or ¹⁸O for ¹⁶O. Anynumber of differential isotopes can be incorporated so long as there isa sufficient difference in mass to be distinguished by MS, as disclosedherein. Because the molecules are chemically identical except for theisotopic difference, the molecules behave physicochemically the same.Furthermore, if desired, more than two samples can be compared if asufficient number of isotopic labels (e.g., d0, d4, d8, d12) areavailable such that the multiple samples can be compared anddistinguished by MS. Quantitation is based on stable isotope dilution.One method to quantitate a sample is to spike a sample with an internalstandard that is chemically identical but isotopically different. Astandard curve can be generated with dilution of isotope to extrapolatethe quantity of molecule in a sample. In such a case, the molecule to bespiked must be identical and therefore the molecules in the sample mustbe known.

Another convenient method for quantitating polypeptides in a sample isto use a reagent such as ICAT™ (Gygi et al., Nature Biotechnol.17:994-999 (1999); WO 00/11208). An ICAT™ type reagent, which isdescribed in more detail below, contains an affinity tag, a linkermoiety in which one or more stable isotopes can be incorporated, and areactive group that can covalently couple to an amino acid side chain ina polypeptide such as a cysteine. For quantitation using an ICAT™ typereagent, parallel samples are treated with different isotopic versionsof the ICAT™ type reagent. A sample can be labeled and compared to aparallel labeled sample, for example, to normalize to a reference orcontrol sample for quantitation. The use of an ICAT™ type reagent toidentify and quantitate polypeptides in a sample is illustrated in FIG.3. Because the peptides labeled with different isotopic versions of theICAT™ type reagent behave physicochemically the same, the samepolypeptides in the two samples will co-purify but still bedistinguishable by MS due to the isotopic differences in the ICAT™ typelabel. Accordingly, the relative amounts of the same polypeptides can bereadily compared and quantitated (Gygi et al., supra, 1999). Every otherscan can be devoted to fragmenting and then recording sequenceinformation about an eluting peptide (MS/MS spectrum). The parentpolypeptide that this peptide originated from can be identified bysearching a sequence database with the recorded MS/MS spectrum. Theprocedure thus provides the relative quantitation and identification ofthe components of protein mixtures in a single analysis. Such acomparison can be useful for quantitating the expression levels ofpolypeptides relative to a reference sample, for example, comparingexpression levels in a sample from an individual having a disease orsuspected of having a disease to a sample from a healthy individual orfor forensic purposes.

In addition to being useful for quantitation of polypeptides, an ICAT™type reagent also functions as a constraint on the complexity of thesystem, that is, only polypeptides or fragments thereof containing theamino acid reactive with the ICAT™ type reagent will be labeled andcharacterized if the polypeptides are affinity isolated or comparedside-by-side with a differentially isotopically labeled sample (Gygi etal., supra, 1999). Accordingly, the use of an ICAT™ type reagent canprovide a reduction in complexity of the sample. Furthermore, theability of a polypeptide or fragment thereof to be labeled with an ICAT™type reagent, that is, whether the peptide contains the reactive aminoacid, is a characteristic associated with the polypeptide useful foridentifying the polypeptide in combination with additionalcharacteristics.

An additional advantage of the use of an ICAT™ type reagent is that theidentity of polypeptides in a sample need not be known prior toanalysis. As described above, isotopic dilution, where an internalstandard is spiked into a sample, requires that a chemically identicalmolecule that is differentially isotopically labeled be spiked into thesample and, therefore, requires that a polypeptide or fragment thereofto be quantitated is known so that a chemically identical isotopicallylabeled molecule be added. With an ICAT™ type reagent, no priorknowledge of the exact polypeptides or fragments need be known.Furthermore, there is no need to synthesize a variety of isotopicallylabeled molecules for characterizing a variety of polypeptides in asample.

In addition to using a labeling reagent such as an ICAT™ type reagentthat incorporates an affinity label, other labeling reagents can be usedto differentially isotopically label two different samples containingpolypeptides. For example, two chemically identical reagents containingdifferent isotopes can be used to covalently modify two polypeptidesamples, where the reagents do not contain an affinity tag. Accordingly,instead of using an affinity isolation step associated with an ICAT™type tag, other isolation steps, if desired, can be used. Nevertheless,the differentially isotopically labeled polypeptide samples can becompared for quantitative analysis. For example, methylation ofpolypeptides via esterification with methanol containing d0 (nodeuterium) versus d3 (three deuteriums) can be used to differentiallyisotopically label two polypeptide samples. Similarly, any of the wellknown methods for modifying side chain amino acids in polypeptides cananalogously be used with differentially labeled isotopes such asdeuterium for hydrogen, C¹³ for C¹², O¹⁸ for O¹⁶ (see, for example,Glazer et al., Laboratory Techniques in Biochemistry and MolecularBiology: Chemical Modification of Proteins, Chapter 3, pp. 68-120,Elsevier Biomedical Press, New York (1975); Pierce Catalog (1994),Pierce, Rockford Ill.). Any number of the differential isotopes can beincorporated so long as parallel labeled polypeptides contain asufficient mass distinction to be detected by MS. In addition tochemical modification of a polypeptide, as described above, twopolypeptide samples can be digested with a protease such as trypsin orthe like in the presence of O¹⁶- versus O¹⁸-labeled H₂O. Since theprotease cleavage reaction results in the addition of water to thecleaved peptides, cleavage in the presence of isotopicallydifferentially labeled H₂O can be used to incorporate differentiallabels into separate polypeptide samples. It is understood that anymethod useful for incorporating an isotopic label to differentiallylabel two polypeptide samples can be used in methods of the invention,particularly for quantitative methods, so long as the samples to becompared are treated in a chemically similar fashion such that theresulting labeled polypeptides essentially differ only by thedifferential isotopic label.

Still another method to quantitate a sample is to incubate a sampleunder conditions that allow metabolic incorporation of isotopes into twosamples for comparison by incubating a sample in the presence of anisotope or incubating in media that results in depletion of a naturallyoccurring isotope (see, for example, Oda et al., Proc. Natl. Acad. Sci.USA 96:6591-6596 (1999)). Such a method is particularly useful for asample that is conveniently cultured, for example, a microbial sample ora primary culture of cells obtained from an individual. Accordingly,both in vitro and in vivo methods can be used to differentiallyisotopically label two samples for comparison and/or quantitation.

The methods of the invention are based on determining characteristics ofa polypeptide that allow identification of the polypeptide based on thedetermined physicochemical characteristics. The collection ofphysicochemical characteristics that can function to identify apolypeptide is essentially a “bar code” for the polypeptide, that is, acollection of characteristics sufficient to uniquely identify apolypeptide based on correlating the characteristics with a referencedatabase that functions as a polypeptide identification index. Themethods are particularly advantageous for rapid and efficient analysisof complex samples containing many different polypeptides, which wouldbe time consuming and inefficient using other methods. The methods ofthe invention can thus be applied to analyze complex samples containingnumerous different polypeptides and are particularly useful inproteomics applications. Accordingly, the methods of the invention canbe advantageously used to identify polypeptides of the proteome. Sincethe proteome reflects polypeptide expression and post-translationalmodifications correlated with the metabolic state of the cell, themethods can also be used in diagnostic applications to determine normalor aberrant polypeptide expression associated with a disease.Accordingly, the methods of the invention can be used in clinicalapplications to diagnose a disease or condition.

The methods of the invention advantageously use constraining parametersthat allow the identification of a polypeptide from a complex mixture ofdifferent polypeptides. The constraints can be used to simplify theidentification of polypeptides. A constraint can be, for example, theinclusion of one or more additional characteristics associated with apolypeptide, the identification of a subset of polypeptides from acomplex mixture, or any type of constraint that can be used to simplifythe analysis of a complex mixture of polypeptides. The methods of theinvention thus provide more efficient identification of polypeptides ina complex mixture, including large numbers of polypeptides, which isparticularly useful for proteome analysis.

The generation and use of a polypeptide identification index provideseveral advantages. First, the methods can be used with selectiveisolation of polypeptide fragments containing specific structuralfeatures, which can be exploited by tagging with specific chemicalreagents. The affinity selection of “tagged” fragments simplifies thepolypeptide mixture, rendering it compatible with highlydenaturing/solubilizing conditions that can be used for proteinisolation and handling. The selective isolation of fragments alsoconstrains database searching. For example, selective cysteine tagging,as disclosed herein, reduces the complexity of the peptide mixture byapproximately 10-fold.

A second advantage of the invention methods is that they can be readilyused in a variety of laboratory settings. For example, mass measurementsare absolute and chromatographic parameters can be easily standardized.Therefore, a polypeptide identification index determined by methods ofthe invention is easily transferable between laboratories, and datagenerated by different laboratories can be easily compared with apolypeptide identification index generated under similar conditions.This advantage can be further exploited by making the method accessiblevia a network, for example, through the construction of a Web-basedsearch tool. A third advantage is that the methods can be performed witha single stage mass analysis, which is fast, simple and sensitive. Afourth advantage is that the methods can be used to accurately measurethe ratio of each polypeptide present in a complex polypeptide sample,provided that the samples have been modified with a stable isotopelabel. Finally, the methods have an essentially unlimited samplecapacity, assuring the possibility of analyzing polypeptides of very lowabundance, and have a high peak capacity, allowing for the analysis ofvery complex samples.

As disclosed herein, in addition to isolating individual parent ionsprior to fragmentation, multiple ions can be fragmented in parallelwithout single ion selection (see FIG. 2). Accordingly, a key advantageof such a method is that the parameters can be easily determined inparallel for multiple polypeptides rather than separately for eachpeptide as is the case in protein identification by MS/MS.

In one embodiment of the invention, a polypeptide identification indexis generated by determining characteristics associated with apolypeptide, in particular, fragment ion mass measurements by MS/MSgenerated with or without parent ion selection (FIG. 2) and optionallyincluding chromatographic steps. These mass determinations are notrequired to be at high accuracy. The accurate mass can be calculated, ifdesired, and compiled into an index with other characteristicsassociated with a particular polypeptide. A sufficient number ofcharacteristics are determined to allow identification of a polypeptidein the index. The methods can optionally and advantageously be used withquantitation to provide additional information on the physiologicalstate of a sample. However, in the case of simpler systems, for example,microbial or viral genomes or specimens from an individual containing asmaller number of polypeptides such as spinal fluid, the complexity ofpolypeptides in a sample can be sufficiently small enough thatqualitative analysis of the polypeptides in a sample is sufficient forparticular applications. As such, if a qualitative determination of theexpression of a polypeptide in a sample is sufficient to correlate witha particular condition, for example, a disease condition, then themethods of the invention can be applied to a qualitative identificationof a polypeptide in a sample.

For example, as shown in FIG. 2, an invention method can be performed inthe absence of single ion selection or in the absence of ion selectionin a source region. The regions designated, for example, as Q₁, Q₂, Q₃and the like, refers to quadrupoles. These are physical means toseparate a selected ion based on m/z. However, it is understood that anyappropriate methods suitable for separating selected ions, in additionto the use of quadrupoles, can be used in methods of the invention.

As used herein, the term “characteristic” when used in reference to apolypeptide refers to a physicochemical property of a polypeptide.Physicochemical properties include physicochemical properties of aparent polypeptide such as molecular mass, amino acid composition, pIand the like, as well as physicochemical properties of fragment of apolypeptide, including fragment ions, which can be correlated with apolypeptide and are thus considered to be characteristics associatedwith a parent polypeptide. Physicochemical properties of a polypeptidealso include measurable behaviors of a polypeptide that result from itsparticular physicochemical properties. For example, physicochemicalproperties include the order of elution on specific chromatographicmedia under defined conditions, and the position to which a polypeptidemigrates in a polyacrylamide gel under defined conditions. Thecharacteristics can be determined empirically or can be predicted basedon known information about the polypeptide, for example, sequenceinformation.

As used herein, the term “characteristics associated with a polypeptide”refers to physicochemical properties of a polypeptide and/or anyfragment of the polypeptide. As such, the characteristics associatedwith a polypeptide include specific characteristics of a parentpolypeptide as well as characteristics of a fragment of the parentpolypeptide which, because the fragment can be related to thepolypeptide, are considered to be characteristics associated with theparent polypeptide. Such characteristics can be used to identify apolypeptide, for example, by comparison with a polypeptideidentification index.

As used herein, the term “polypeptide” refers to a peptide orpolypeptide of two or more amino acids. A polypeptide can also bemodified by naturally occurring modifications such as post-translationalmodifications, including phosphorylation, lipidation, prenylation,sulfation, hydroxylation, acetylation, addition of carbohydrate,addition of prosthetic groups or cofactors, formation of disulfidebonds, proteolysis, assembly into macromolecular complexes, and thelike.

A modification of a polypeptide, particularly ligand polypeptides, canalso include non-naturally occurring derivatives, analogues andfunctional mimetics thereof generated by chemical synthesis, providedthat such polypeptide modification displays a similar functionalactivity compared to the parent polypeptide. For example, derivativescan include chemical modifications of the polypeptide such asalkylation, acylation, carbamylation, iodination, or any modificationthat derivatizes the polypeptide. Such derivatized molecules include,for example, those molecules in which free amino groups have beenderivatized to form amine hydrochlorides, p-toluene sulfonyl groups,carbobenzoxy groups, t-butyloxycarbonyl groups, chloroacetyl groups orformyl groups. Free carboxyl groups can be derivatized to form salts,methyl and ethyl esters or other types of esters or hydrazides. Freehydroxyl groups can be derivatized to form O-acyl or O-alkylderivatives. The imidazole nitrogen of histidine can be derivatized toform N-im-benzylhistidine. Also included as derivatives or analogues arethose polypeptides which contain one or more naturally occurring aminoacid derivatives of the twenty standard amino acids, for example,4-hydroxyproline, 5-hydroxylysine, 3-methylhistidine, homoserine,ornithine or carboxyglutamate, and can include amino acids that are notlinked by peptide bonds.

A particularly useful polypeptide derivative includes modification ofsulfhydryl groups, for example, the modification of sulfhydryl groups toattach affinity reagents such as an ICAT™ type reagent. A particularlyuseful modification of a polypeptide includes modification ofpolypeptides in a sample with a moiety having a stable isotope. Forexample, two different polypeptide samples can be separately labeledwith moieties that are isotopically distinct, and such differentiallylabeled samples can be compared. Modification of polypeptides withstable isotopes is particularly useful for quantitating the relativeamount of individual polypeptides in a sample.

As used herein, a “fragment” refers to any truncated form, eithercarboxy-terminal, amino-terminal, or both, of a parent polypeptide.Accordingly, a deletion of a single amino acid from the carboxy- oramino-terminus is considered a fragment of a parent polypeptide. Afragment generally refers to a deletion of amino acids at the N- and/orC-terminus but also includes modifications where a side chain is removedbut the peptide bond remains. A fragment includes a truncatedpolypeptide that is generated, for example, by polypeptide cleavageusing a chemical reagent, enzyme, or energy input. A fragment can resultfrom a sequence-specific or sequence independent cleavage event.Examples of reagents commonly used for cleaving polypeptides includeenzymes, for example, proteases, such as thrombin, trypsin, chymotrypsinand the like, and chemicals, such as cyanogen bromide, acid, base, ando-iodobenzoic acid, as disclosed herein. A fragment can also begenerated by a mass spectrometry method. Furthermore, a fragment canalso result from multiple cleavage events such that a truncatedpolypeptide resulting from one cleavage event can be further truncatedby additional cleavage events.

As used herein, the term “polypeptide identification index” refers to acollection of characteristics associated with a polypeptide sufficientto identify and distinguish other polypeptides in the index. Apolypeptide identification index is therefore a collection ofpolypeptide identification codes for identifying a polypeptide based oncharacteristics of the polypeptide or a fragment thereof. A polypeptideidentification index can be based on deduced characteristics associatedwith a polypeptide, for example, characteristics predicted based onsequence information such as genomic sequence, cDNA sequence, or ESTdatabases. A polypeptide identification index can also be based onempirically determined characteristics, or a combination of deduced andempirically determined characteristics. An “annotated polypeptide (AP)index” refers to an index comprising at least one empirically determinedcharacteristic for each of the polypeptides in the index, which can bedetermined, for example, by the methods disclosed herein. If desired, anAP index can be based on entirely empirically determined characteristicsor a combination of deduced and empirically determined characteristics.The use of an annotated polypeptide index is particularly useful foridentifying polypeptides modified by post-translational modifications,which can have characteristics unpredictable based on deduction from asequence database alone.

A “polypeptide identification subindex” refers to a subset of apolypeptide identification index that contains less than all of thepolypeptide identification codes of the polypeptide identificationindex. A subindex can contain, for example, five polypeptideidentification codes from a polypeptide identification index of tenpolypeptide identification codes, which is a subset of the entire index.Identification of a subindex can be useful, for example, for reducingthe complexity of a search of a polypeptide identification index,similar to the reduction in complexity that can be applied to apolypeptide sample by the fractionation methods disclosed herein.Accordingly, a search of a subindex can be advantageous in requiringless computational time than required to search an entire index.

As used herein, the term “identification code” refers to a set ofcharacteristics associated with a polypeptide that is sufficient todetermine the identity of the polypeptide and distinguish thepolypeptide from other polypeptides in a polypeptide identificationindex. An identification code is essentially an annotated peptide tag,or “bar code,” that can be used to identify a polypeptide.

The invention provides a method for identifying a polypeptide. Themethod includes the steps of determining two or more characteristicsassociated with a polypeptide or fragment thereof, one of thecharacteristics being the mass of a fragment of the polypeptide, whereinthe fragment mass is determined by mass spectrometry; comparing thecharacteristics associated with the polypeptide to a polypeptideidentification index such as an annotated polypeptide index; andidentifying one or more polypeptides in the polypeptide identificationindex having the determined characteristics. The fragment can bedetermined at an accuracy in ppm of greater than 1 part per million(ppm) or at even lower accuracy (higher ppm). The method can furtherinclude determining one or more additional characteristics associatedwith the polypeptide and comparing the characteristics determined ineach of the steps to the polypeptide identification index. Optionally,the steps of determining one or more additional characteristicsassociated with the polypeptide and comparing the characteristicsdetermined in each step to the polypeptide identification index can berepeated one or more times, wherein a set of characteristics isdetermined that identifies a single polypeptide in the polypeptideidentification index. The method can further include quantitating theamount of polypeptide in a sample. Furthermore, the methods can be usedto measure the relative abundance in two or more different populationsof polypeptides, that is, polypeptide mixtures, for example, populationsof polypeptides in different samples.

The methods of the invention for identifying a polypeptide includedetermining characteristics associated with the polypeptide, or afragment of the polypeptide. Characteristics associated with apolypeptide that are useful for identifying a polypeptide are thosecharacteristics that can be reproducibly determined. Physicochemicalproperties of a polypeptide or fragment include, for example, atomicmass, amino acid composition, partial amino acid sequence, apparentmolecular weight, pI, and order of elution on specific chromatographicmedia under defined conditions. Such characteristics determined to beassociated with a polypeptide are used for the identification of thepolypeptide. Methods for determining characteristics associated with apolypeptide are described in more detail below.

One of the characteristics particularly useful in methods of theinvention is the mass of a polypeptide or a fragment or fragmentsthereof. A fragment of a polypeptide can be generated prior to or duringthe process of mass determination by mass spectrometry. A polypeptidefragment mass can therefore be the mass of a fragment of a polypeptidegenerated during polypeptide sample preparation, or can be the mass offragment generated by a polypeptide cleavage that occurred during massspectrometry.

In the methods of the invention, the mass of a polypeptide fragment isdetermined by mass spectrometry, and can advantageously be determined inthe absence of ion selection for producing fragment ions. The methods ofthe invention allow the identification of a polypeptide without the needfor sequencing the polypeptide or fragment thereof. A polypeptidefragment mass can be determined using a variety of mass spectrometrymethods known in the art, as described herein.

A variety of mass spectrometry systems can be employed in the methods ofthe invention for identifying a polypeptide. Mass analyzers with highmass accuracy, high sensitivity and high resolution include, but are notlimited to, matrix-assisted laser desorption time-of-flight (MALDI-TOF)mass spectrometers, ESI-TOF mass spectrometers and Fourier transform ioncyclotron mass analyzers (FT-ICR-MS). Other modes of MS include anelectrospray process with MS and ion trap. In ion trap MS, fragments areionized by electrospray or MALDI and then put into an ion trap. Trappedions can then be separately analyzed by MS upon selective release fromthe ion trap. Fragments can also be generated in the ion trap andanalyzed. The ICAT™ type reagent labeled polypeptides that can be usedin the methods of the invention can be analyzed, for example, by singlestage mass spectrometry with a MALDI-TOF or ESI-TOF system.

If desired, different MS analysis can be applied for generating apolypeptide identification index than for determining characteristics ofan unknown polypeptide. For example, LC-MS/MS can be used for collectingdata for the identification index and LC-ESI-TOF can be used formeasurement of characteristics of an unknown polypeptide. It isunderstood that any MS methods and any combination of MS methods can beused so long as the samples are treated in a substantially similarmanner and so long as the MS methods are compatible for comparison ofmasses determined by the different methods.

The methods of the invention can involve a polypeptide separation stepfollowed by a mass analysis step. Polypeptide separation and massanalysis steps can be performed independently or can be coupled in an“on line” analysis method. Various modes of polypeptide separationtechniques can be coupled to a mass analyser. For example, polypeptidescan be separated by chromatography using microcapillary HPLC, by solidphase extraction-capillary electrophoresis systems that can be coupledto a mass analyzer, or by gel electrophoresis methods. A specificexample of a coupled polypeptide separation and mass analysis method ismicro-capillary HPLC coupled to an ESI-MS/MS system that is applied withdynamic exclusion on an ion trap MS.

Different types of mass spectrometry can be used for differentapplications of the methods of the invention. For certain applications,such as mass determination of a polypeptide fragment for generating apolypeptide identification index, a method that provides high accuracy,such as an accuracy of less than 1 part per million. However, themethods of the invention are advantageous in that MS of lower accuracy,that is higher ppm resolution, can be conveniently used without the needfor more expensive instrumentation required for higher accuracydeterminations. For applications that involve high throughput analysisof a population of polypeptides, a lower accuracy mass determination canbe sufficient. Lower accuracy mass determinations generally providehigher sample throughput because less time is required to make a massdetermination.

The methods of the invention involving mass determinations can beconveniently performed at lower accuracy. For example, high massaccuracy instruments such as FTMS or FTICR MS can be used to determineaccuracy at 0.2 ppm (Goodlett et al. et al., Anal. Chem. 72:1918-1924(2000)). The use of very high mass accuracy such as 0.1 ppm acts as aconstraint. However, the methods of the invention are advantageous inthat several characteristics associated with a polypeptide can bedetermined. When combined with additional characteristics, the massescan be determined at lower accuracy, that is higher ppm. Determinationof mass at lower accuracy allows the use of less expensive MSinstruments which are more widely available than FTMS. The massdeterminations can be determined at an accuracy in ppm of 1 part permillion (ppm) or greater than 1 ppm, and can be determined at anaccuracy in ppm of 2.5 ppm or greater, of about 5 ppm or greater, about10 ppm or greater, about 50 ppm or greater, about 100 ppm or greater,about 200 ppm or greater, about 500 ppm or greater, or even about 1000ppm or greater, sequentially each of which requires less accuracy of theMS instrument. The methods of the invention advantageously allow the useof lower accuracy MS analysis in combination with other physicochemicalcharacteristics, as disclosed herein to identify a polypeptide in asample. The accuracy of the MS measurement for a particular applicationcan be readily determined by one skilled in the art, for example,depending on the complexity of the sample and/or index to be used.

The methods of the invention for identifying a polypeptide can involvedetermining the mass of a polypeptide fragment at an accuracy of greaterthan 1 part per million. Therefore, the method does not require a MSmethod having high accuracy. Accordingly, a lower-cost MS system can beemployed in the methods of identifying a polypeptide. The adaptation ofany mass spectrometer to a high throughput format, such as 96-well plateor 384 spot plate format, or to an autoinjection system that allowsunattended operation, is advantageous for increasing sample throughput.

In methods of the invention, the mass of a polypeptide or fragmentthereof can be determined in the absence of ion selection for producingfragment ions. An overview of the strategy of a protein identificationmethod is shown in FIG. 1. Polypeptides are optionally fractionated, forexample, using polyacrylamide gel electrophoresis, and the polypeptidescan further be fragmented into peptides. The peptides can further beoptionally fractionated by chromatography. A chromatographic fraction,or bin (indicated by “*” in FIG. 1), is subjected to MS. Traditionally,an ion or dominant ions are selected in a collision cell forcollision-induced dissociation (CID). Selection of a single ion isdepicted in Q1 of FIG. 1. An ion is selected and then fragmented, asshown in Q3 of FIG. 1. In the absence of ion selection, instead of asingle ion being selected, no selection of ions is applied but, rather,all of the ions are fragmented, leading to many peptide fragments. Thepeptide fragments are deconvoluted to determine which correspond to aparticular parent polypeptide, and such Information on the mass of afragment of a polypeptide is a characteristic associated with thepolypeptide (see FIG. 4). As shown in the bottom of FIG. 1, the fragmentmasses can be combined with any number of additional characteristics andcompared to a protein identification index, for example, a sequencedatabase, and the polypeptide is identified based on those determinedcharacteristics.

A set of determined characteristics associated with a polypeptide arecompared to the characteristics associated with a polypeptide in apolypeptide identification index. A polypeptide identification index isa collection of characteristics associated with individual polypeptidesthat uniquely identify and distinguish the polypeptides from otherpolypeptides annotated in the index. By comparing the set of determinedcharacteristics associated with a polypeptide to a polypeptideidentification index, one or more polypeptides in the polypeptideidentification index that share the same characteristics can beidentified. If more than one polypeptide is determined to have the samecharacteristics, additional constraints can be included, for example,the determination of one or more additional characteristics. Apolypeptide identification index can be based on deduced characteristicsof a polypeptide, for example, one or more characteristics deduced fromgenetic sequence databases, or can be determined empirically, as withthe annotated peptide tag index described herein.

One exemplary method of generating an annotated peptide index is to:Harvest Proteins; Label Proteins with an Isotope Coded Affinity Tag(ICAT™) type Reagent; Fractionate Proteins by Molecular Weight; DigestProteins to Peptides (e.g. using Trypsin); Separate Peptides by IonExchange; Purify each Ion Exchange Fraction by Affinity chromatography;Analyze each Affinity chromatography Fraction by LC/MS/MS (or CE/MS/MS);Identify all Expressed Proteins via Database Search of individual MS/MSPeptide Spectra; Generate a Database of Annotated Peptide Tags thatconstitute a unique barcode for an individual; Peptide based on measuredPhysicochemical properties and thus the Parent Protein of that Peptide.It is understood that the above-described method, combinations of thesesteps, modifications thereof, or any methods suitable to allow thedetermination of characteristics associated with a polypeptide can beused to generate a polypeptide identification index containing at leastone empirically determined characteristic, as described herein.

The methods of the invention can further include determining one or moreadditional characteristics associated with the polypeptide forcomparison with a polypeptide identification index. The process ofdetermining one or more additional characteristics associated with apolypeptide followed by comparing with a polypeptide identificationindex can be repeated until a single polypeptide is uniquely identifiedfrom the polypeptide identification index. Accordingly, if additionalconstraints are applicable, they can be included to identify apolypeptide by comparison to a polypeptide identification index (seeFIG. 4C).

The number of characteristics sufficient to identify of a polypeptidecan be readily determined by one skilled in the art by comparing thedetermined set of characteristics with the polypeptide identificationindex. The identification of a single polypeptide in a polypeptideidentification index refers to determining a set of characteristics thatare sufficient to distinguish the polypeptide from another polypeptidein the polypeptide identification index. For example, if two determinedcharacteristics match a single polypeptide in a polypeptideidentification index, then the two characteristics are sufficient toidentify a single polypeptide. Similarly, for a different polypeptide,three determined characteristics can be required to uniquely identify apolypeptide in the index. Accordingly, based on the characteristicsdetermined for a polypeptide, a comparison is made to a polypeptideidentification index. If a single polypeptide is identified, then asufficient number of characteristics have been determined. If more thanone polypeptide is identified, then one or more additionalcharacteristics can be determined until a single polypeptide uniquelymatches the determined characteristics, thereby allowing identificationof the polypeptide. Therefore, one skilled in the art can readilydetermine if a sufficient number of characteristics, based on comparisonto a particular polypeptide identification index, have been determinedfor a polypeptide to allow identification of a unique polypeptide in thepolypeptide identification index.

The methods of the invention are advantageously based on the inclusionof selected constraints that allow more efficient identification of apolypeptide, particularly in complex samples containing numerousdifferent polypeptides. The methods can also be advantageously used toidentify multiple polypeptides simultaneously from a complex sample.Accordingly, rather than determining a large number of characteristicsassociated with different polypeptides, the methods can be performed inan iterative manner, if desired, with the inclusion of additionalconstraints as needed to identify a single polypeptide in a polypeptideidentification index.

For example, polypeptides that are homologous generally have segments ofhigh sequence identity. Such polypeptides can arise, for example, frompolypeptides having similar function, splice variants of the samenucleic acid, and the like. Polypeptides having segments of highsequence identity can have in common several physicochemicalcharacteristics, particularly in association with homologous fragmentsof the polypeptide. Polypeptides sharing a high degree of similarity cantherefore have a similar or identical set of associated characteristics.For such similar polypeptides, a given set of characteristics sufficientto distinguish two dissimilar polypeptides can be insufficient for theidentification of a single polypeptide in a polypeptide identificationindex when the polypeptides have regions of similarity. In such a case,one or more additional characteristics associated with the polypeptidecan be determined, and the determination of additional characteristicscan be repeated until the subject polypeptide can be distinguished fromeach other polypeptide in a polypeptide identification index. Themethods of determining a set of characteristics associated with apolypeptide, comparing with a polypeptide identification index, anddetermining additional characteristics until a single polypeptide in apolypeptide identification index is identified can be applied to one ormore polypeptides.

Thus, additional constraints, as needed to identify a polypeptide, canbe considered. For example, if more than one polypeptide in apolypeptide identification index has a given set of characteristics, theidentification of selected polypeptides of the polypeptideidentification index, that is, a subset of polypeptides in the index ora subindex of the index, functions essentially as a constraint.Accordingly, a subsequent comparison to the polypeptide identificationindex can be made to the subindex, which can reduce the calculation timeand provide a more efficient comparison, if desired. An additionalconstraint can then be considered, for example, an additionalcharacteristic, and compared to the subindex, which can result in areduction in the number of polypeptides having all of the determinedcharacteristics. Such steps can be optionally repeated until a singlepolypeptide in the polypeptide identification index is identified. Suchan approach is advantageous when determining the identity of multiplepolypeptides simultaneously because only those characteristicssufficient to identify a polypeptide need be determined. The methods canthus readily accommodate the determination of the identity of a varietyof polypeptides and the complexities associated with proteomics analysiswithout wasting resources on unnecessary data acquisition.

The methods of the invention for generating a polypeptide identificationindex involve determining, for two or more polypeptides, a set ofcharacteristics that can be used to identify the polypeptide. A set ofcharacteristics that uniquely identify a polypeptide in a polypeptideidentification index define a polypeptide identification code or “barcode” for the polypeptide. A polypeptide identification index cancontain a variety of characteristics associated with an indexedpolypeptide. Polypeptide characteristics contained in a polypeptideidentification index can include polypeptide mass, amino acidcomposition, partial amino acid composition, for example, the presenceof a particular amino acid, pI, order of elution on specificchromatographic media, and one or more polypeptide fragment masses. Apolypeptide identification index can additionally include amino acidsequence, references to related polypeptides, database entries orliterature, as well as other information relevant to the identificationof a polypeptide. The user will know what types of information areuseful for a polypeptide index and can include any physicochemicalproperty or information relating to a polypeptide. A polypeptideidentification index containing a large number of identification codesfor a variety of polypeptides is particularly useful for identifyingpolypeptides in complex samples.

The methods of the invention directed to identifying a polypeptide arebased on comparing characteristics determined for a polypeptide with apolypeptide identification index. A polypeptide identification index canbe a commercially or publicly available database such as GenBank(www.ncbi.nlm.nih.gov/GenBank), in which one or more characteristics ofa polypeptide are predicted, for example, amino acid composition, massof a polypeptide or fragment thereof, and the like. In addition, apolypeptide identification index can be based on empirically determinedcharacteristics determined by the methods described herein. In addition,a polypeptide identification index can be a combination of predicted andempirically determined characteristics, for example, like the annotatedpolypeptide (AP) index disclosed herein, also referenced as an annotatedpeptide tag (APT) index.

The set of characteristics associated with a polypeptide can bedetermined experimentally using a variety of methods. An exemplarymethod for polypeptide identification and/or determining characteristicsfor generating a polypeptide identification index is shown in FIG. 1 anddescribed below. The method is useful for defining a polypeptideidentification code because the method involves a series of steps, whichallow the determination of characteristics associated with apolypeptide, the final step being mass determination of a polypeptide orfragment. The method can include: (i) polypeptide sample preparation;(ii) polypeptide tagging; (iii) optional polypeptide fractionation; (iv)polypeptide fragmentation; (v) polypeptide fragment separation; (vi)affinity isolation of tagged polypeptide fragments; (vii) highresolution polypeptide fragment separation; (viii) database searching;and (ix) polypeptide identification index construction (see Example I).

For polypeptide sample preparation, polypeptide samples for whichquantitative proteome analysis is to be performed are isolated from therespective sources using standard protocols for maintaining thesolubility of all the polypeptides. Polypeptide samples and preparationof polypeptide samples are discussed in detail below.

For polypeptide tagging, the polypeptides in the sample can bedenatured, optionally reduced, and a chemically reactive group of thepolypeptides is covalently derivatized with a chemical modificationreagent. An exemplary reactive group is a sulfhydryl group thatrepresents a side chain of a reduced cysteine residue, which can bederivatized by a reagent such as ICAT™ (Gygi et al., Nature Biotechnol.17:994-999 (1999)) or IDEnT reagent (Goodlett et al., Anal. Chem.72:1918-1924 (2000)). Other useful reactive groups include amino orcarboxyl groups of polypeptides or specific post-translationalmodifications, including phosphate, carbohydrate or lipid. Any chemicalreaction with specificity for a chemical group in the polypeptide can beapplied in this step. The ICAT™ type reagent and IDEnT reagents andmethods of use are described in more detail below.

For optional polypeptide fractionation, the mixture of taggedpolypeptides can be fractionated using any polypeptide separationprocedure. A fractionation procedure useful in methods of the inventionis reproducible, allows polypeptides to remain soluble, and has a highsample and peak capacity. Any optional fractionation technique can beperformed to enrich for low abundance proteins and/or to reduce thecomplexity of the mixture, while the relative quantities are maintained.Exemplary fractionation methods include, for example, sodium dodecylsulfate-polyacrylamide gel electrophoresis (SOS-PAGE), chromatographicmethods such as size exclusion, ion exchange, and the like, as disclosedherein. Polypeptide fractionation methods are described in more detailbelow.

For polypeptide fragmentation, the polypeptides in the sample mixture,or the polypeptides contained in each fraction if optional samplefractionation is employed, can be subjected to sequence specificcleavage, such as cleavage by trypsin. The use of sequence specificcleavage can be particularly useful because the termini of peptidescleaved by a sequence specific method can act as a constraint. However,it is understood that the cleavage method used to generate fragmentsneed not be sequence specific, if desired. Methods useful for cleavingpolypeptides in a sequence specific manner are described in more detailbelow.

For polypeptide fragment separation, the resulting polypeptide fragmentmixtures can optionally be subjected to a first dimension peptideseparation. Separation methods having a high sample capacity, at leastmoderate resolving power and highly reproducible separation patterns areuseful in this step. Examples of first dimension separation methodsinclude anion and cation ion exchange chromatographies. Chromatographicmethods are described in more detail below. Although polypeptidefragment separation can optionally be performed, the methods can beadvantageously used such that the characteristics of peptide fragmentsare measured in “bulk,” that is, the methods do not require peptidefragment purification to homogeneity.

For affinity isolation of tagged polypeptide fragments, polypeptidefragments can be isolated from each chromatographic fraction using anaffinity reagent that binds to the polypeptide tag. For example,polypeptide fragments tagged with the ICAT™ reagent exemplified hereincan be isolated using avidin or streptavidin affinity chromatography. Anexample of a useful affinity medium for isolation of ICAT™ labeledpolypeptide fragments is monomeric avidin immobilized on polymer beads.If ICAT™ type reagents with affinity tags different from biotin areused, corresponding affinity media that binds the affinity tag is used.

For high resolution polypeptide fragment separation, liquidchromatography ESI-MS/MS can be used. The polypeptide fragment mixtureseluted from the affinity chromatography columns can be individuallyanalyzed by automated LC-MS/MS using capillary reversed phasechromatography as the separation method (Yates et al., Methods Mol.Biol. 112:553-569 (1999)) and data dependent CID with dynamic exclusion(Goodlett, et al., supra, 2000) as the mass spectrometric method.

For database searching, the sequence of polypeptide fragments for whichsuitable CID spectra were obtained are determined by searching asequence database from the species under investigation. A sequencedatabase search program such as Sequest (Eng, J. et al., J. Am. Soc.Mass. Spectrom. 5:976-989, (1994)) or a program with similarcapabilities can be advantageously used to search a database.

For polypeptide identification index construction, the sequences of allthe peptides that have been identified by the procedure described abovecan be entered in a database and annotated with characteristics thatwere generated during the above-described steps. These attributes caninclude, for example, partial amino acid composition, the approximatemolecular mass of the parent polypeptide, which can be determined, forexample, by the optional fractionation step, the order of elution from afirst chromatography step, the order of elution time from a secondchromatography step, and the like.

Collectively, a sufficient number of characteristics can be determinedthat distinguish each polypeptide fragment in a polypeptideidentification index. The collection of characteristics that uniquelyidentify a polypeptide represent a “bar code” or polypeptideidentification code. Characteristics associated with an unknownpolypeptide can be subsequently determined and compared to a previouslygenerated polypeptide identification index. Alternatively, a polypeptideidentification index can be determined along with the unknownpolypeptide. However, the accumulation of information related tocharacteristics associated with a polypeptide and collection in an indexis convenient for minimizing the experimental steps needed at the timeof analyzing a sample. Therefore, a polypeptide identification code thatis determined for a fragment of a polypeptide generated in a subsequentexperiments can be used to identify a polypeptide in a sample bycorrelating the polypeptide identification code newly generated for anunknown polypeptide with the polypeptide identification code to identifythe unknown polypeptide.

For the identification of a polypeptide by comparison with a polypeptideidentification index, a set of characteristics associated with apolypeptide can be determined generally as described above, or can bedetermined using an equivalent, modified or abbreviated method, or anymethod that allows for the determination of characteristics associatedwith a polypeptide. The number of characteristics sufficient to uniquelyidentify a polypeptide can be readily determined by those skilled in theart. The methods will generally include the identification of 2 or morecharacteristics, and can include 3 or more, 4 or more, 5 or more, 6 ormore, 7 or more, 8 or more, 9 or more, or even 10 or morecharacteristics, or any number of characteristics so long as asufficient number of characteristics are determined that distinguisheach of the polypeptides in the index.

In generating a polypeptide identification index, characteristicsassociated with a polypeptide can be used to obtain the polypeptidesequence by searching a sequence database. For example, a partial aminoacid sequence of a polypeptide or fragment optionally determined by massspectrometry can be readily used to search a polypeptide or translatednucleic acid sequence database to identify a name or sequenceidentification number, such as an accession number, that uniquelydescribes a polypeptide. A polypeptide identification index cantherefore contain polypeptide characteristics such as a common name, anumeric or alphanumeric identification code from a publicly availabledatabase, or any other identifying code selected for identifying apolypeptide identification code in a polypeptide identification index.

To obtain sequence information from polypeptides that do not have aparent polypeptide or nucleic acid sequence in a database or thatcontain an unexpected post-translational modification that presentsidentification, de novo sequencing can be performed. Identified aminoacid sequence can be used to search a polypeptide or nucleic acidsequence database as described above. De novo sequencing can beperformed using a variety of methods. A particularly useful method of denovo sequencing involves using a MS dataset generated for polypeptideidentification.

It is understood that, although sequence information regarding apolypeptide or portion thereof, for example, determined by a method suchas CID, can be included as a characteristic in a polypeptideidentification index, the methods of the invention obviate the need tosequence an unknown polypeptide in order to identify it, althoughsequence information can be included in generating a polypeptideidentification index, if desired. Accordingly, a polypeptideidentification index can contain information on characteristicsassociated with a polypeptide that is additional to thosecharacteristics sufficient to identify a polypeptide, for example,sequence information. By accumulating information regardingcharacteristics associated with a polypeptide in an index, the identityof a polypeptide can be readily determined in the absence of obtainingsequence information on the unknown polypeptide.

A chromatographic separation can be used to determine a characteristicof a polypeptide because the physicochemical properties of a polypeptideare reflected in the behavior of the polypeptide on chromatographicmedia. For example, a highly charged polypeptide will be eluted from ananion or cation exchange column under specific pH and/or salt conditionsthat differ from the pH and/or salt conditions under which an unchargedor oppositely charged polypeptide will elute. Therefore, acharacteristic associated with a polypeptide can be the particular pHand/or salt condition under which the polypeptide is eluted from achromatographic column. Similarly, conditions under which a polypeptideelutes from any type of chromatographic column can be determined. Anorder of elution or buffer condition at which a polypeptide is elutedfrom a column can be assigned a value to be annotated in a polypeptideindex or to be used for comparing with corresponding values in apolypeptide index. A value can be, for example, relative position in anelution profile under defined conditions, a time of elution under agiven set of conditions and flow rate, the relative time or order ofelution in relation to an external standard fraction number, saltconcentration, pH, or any parameter that describes the behavior of apolypeptide on a particular chromatography column that can bereproducibly determined. Alternative methods include gelelectrophoresis, for example, isoelectric focusing (IEF) or otheranalytical electrophoretic methods. Methods for fractionatingpolypeptides are well known to those skilled in the art (Scopes, ProteinPurification: Principles and Practice, 3rd ed., Springer Verlag, NewYork (1993)).

Protein fractionation steps are useful in the methods of the inventionfor both reducing the complexity of a polypeptide sample prior to massanalysis of a polypeptide or fragment thereof and for determiningcharacteristics associated with a polypeptide. Any of the well knownfractionation steps, in addition to chromatographic fractionationdescribed above, can be used to reduce the complexity of the sampleand/or serve as a determined characteristic associated with apolypeptide. Exemplary fractionation steps include salt precipitationsuch as ammonium sulfate or precipitation with chemicals such aspolyethylene glycol or polyethyleneimine, subcellular fractionation,tissue fractionation, immunoprecipitation, and the like (see Scopes,supra, 1993). A fractionation step can be used to reduce the complexityof a polypeptide population. For example, complexity reduction can beused in the isolation of a polypeptide subpopulation containingpolypeptides tagged on a particular amino acid. Furthermore, otherfractionation steps such as subcellular fractionation can also beapplied to reduce the complexity of a sample and/or provide acharacteristic useful for identifying a polypeptide. The fractionationsteps can potentially provide biologically important information on thepolypeptide, for example, whether the polypeptide is located in anorganelle or is a nuclear protein, a membrane protein, and/or part of asignaling complex, and the like. Any fractionation step thatadvantageously reduces polypeptide population complexity can be appliedin the methods of the invention.

A polypeptide fractionation step is useful in the methods of theinvention for determining a characteristic associated with apolypeptide. For example, a protein fractionation method based onmolecular weight can be used to determine a polypeptide molecularweight. Methods such as SDS-PAGE, commercially available gel elution orpreparative cell systems (BIO-RAD), and size exclusion chromatographycan be used to determine the apparent molecular weight of a polypeptideor fragment. Polypeptide and/or fragment molecular weight is acharacteristic that can be included in a polypeptide identificationindex.

The particular set of characteristics determined for a polypeptide ingenerating a polypeptide identification index or for identifying apolypeptide can be selected by the user and will depend on thepolypeptide sample, the methods used to prepare the polypeptide sample,the method of mass spectrometry employed and the preferences of theuser. The characteristics of a polypeptide can be obtained in anytemporal order. For example, polypeptide characteristics can becollected in an order that provides time efficiency or convenience orcan be collected as dictated by a particular method selected for sampleprocessing.

In generating a polypeptide index, sequence information, for example,determined by CID, as well as other characteristics of a polypeptide canbe used, and the sequence information is particularly useful forcorrelating other characteristics of a polypeptide with a particularsequence to identify the polypeptide. However, the methods areadvantageous in that, once a polypeptide identification index has beengenerated, obtaining sequence information on a polypeptide is notrequired. Instead, other characteristics sufficient to identify apolypeptide can be determined, for example, masses and/or ratios betweenpeptides as well as other characteristics, and compared to a polypeptideidentification index, which itself can include sequence information,thereby eliminating the need to sequence a polypeptide in order toidentify it.

The methods of the invention for generating a polypeptide identificationindex involve determining a set of characteristics associated with afirst and second polypeptide in which the determined characteristics aresufficient to distinguish the first and second polypeptides.Characteristics that are sufficient to distinguish the first and secondpolypeptides refer to a set of characteristics that can be uniquelyattributed to a polypeptide so that the polypeptide identity can bedetermined unambiguously. In a case in which set of characteristics isshared by one or more polypeptides, an additional characteristic thatallows a polypeptide to be distinguished from another polypeptide isdetermined. Thus, the polypeptides represented in a polypeptideidentification index can be distinguished from each other by the set ofcharacteristics that identify each polypeptide.

The methods of the invention for identifying a polypeptide can beapplied to a population of polypeptides in which two or morepolypeptides are identified and can be conveniently used to identifymultiple polypeptides in a sample simultaneously, if desired. Therefore,the method can be applied to a simple or complex polypeptide sample. Asimple polypeptide sample can be, for example, a purified polypeptidesample containing one to several polypeptides. A complex sample can be,for example, a cell lysate or fraction containing a few to severalhundred polypeptides or even thousands or tens of thousands ofpolypeptides. Using the methods described herein, the determination ofpolypeptide characteristics can require the collection of experimentaldata resulting from a series of steps, such as, for example, a series ofchromatographic separations.

An exemplary process useful for organizing data obtained during analysisof complex polypeptide samples involves parceling information intotheoretical “bins”. For example, an ICAT™ type-labeled mixture ofpolypeptides can be separated by size into a particular number of bins,which can be fractions eluting from chromatography column, such as sizeexclusion, ion exchange, and the like, or segments of an SDSpolyacrylamide gel. The polypeptides in each bin can be fragmented by asequence specific cleavage method. Alternatively, analysis ofpolypeptides in a sample can be performed without fractionating thepolypeptides so long as there has been a sufficient reduction incomplexity of the sample to allow the identification of the polypeptidewithout fractionation. The peptide mixture, which has been fractionatedinto bins, can be further fractionated by various methods, including,for example, ion exchange chromatography, affinity chromatography suchas is used with the isolation of ICAT™ type labeled peptides, or reversephase liquid chromatography. Each bin of peptides can then be furtherbinned by ion exchange chromatography and once again divided furtherinto a particular number of bins. Each of these bins can be furtherseparated by reverse phase chromatography and divided further into aparticular number of bins, each of which can be analyzed by massspectrometry. Hence each polypeptide analyzed by such a method will havefive associated characteristics that can be represented, for example, asa 5-digit polypeptide identification code or “bar code” based oncysteine content, size, charge, hydrophobicity, and mass.

The methods of the invention for indexing characteristics associatedwith a large number of polypeptides use an amount of computer memorythat is quadratic in sequence length. An advanced data structure suchas, for example, suffix trees, can be used to reduce the requirements ofcomputer memory (Gusfield, Algorithms on Strings, Trees and Sequences:Computer Science and Computational Biology, Cambridge University Press(1997)). Suffix trees are a compact data representation for all suffixesin a database of sequences. In their pure form, they can be constructedin linear time and stored in linear, instead of quadratic memory.Various modifications of suffix trees and traversal algorithms can beused to optimize computation time and use of computer memory associatedwith searching a polypeptide identification index.

A set of determined characteristics associated with a polypeptide arecompared to a polypeptide identification index. Various searchalgorithms can be employed for matching values assigned to determinedcharacteristics with annotated values in the index. A useful strategyfor increasing the efficiency of database searching is the narrowing or“constraining” of the database. The term “constrain” when used inreference to a polypeptide identification index refers to a limitationthat is applied to a polypeptide identification index in order to obtaina subindex containing a fraction of polypeptide identification codescorresponding to polypeptides having characteristics that match one orcharacteristics of a polypeptide to be identified. A subindex can begenerated when a group of polypeptides having a common characteristic isselected out of a polypeptide identification index or when a particularcharacteristic contained in a polypeptide identification code is used toomit one or more polypeptides from an index. A common characteristic canbe a definite physicochemical characteristic such as a partial aminoacid sequence or any other determined characteristic assigned a range ofvalues. For example, a mass of a polypeptide fragment expressed as arange of values that account for the error in mass determination canserve as a constraint for selecting a subset of polypeptides orfragments of a particular mass.

One characteristic associated with a polypeptide that can be used tocontrain a database is partial amino acid composition. The partial aminoacid composition of a polypeptide includes the identification of asingle amino acid present in a particular polypeptide or fragmentthereof. A partial amino acid sequence can be obtained, for example, bytreating a polypeptide or fragment thereof with a reagent that resultsin the generation of a polypeptide or fragment that contains one or moredefined amino acids. For example, a sequence specific polypeptidecleavage method will produce fragments with one or more known amino acidresidues at the fragment carboxy- or amino-terminus. However, it is notnecessary to know if a specific amino acid residue is located at thefragment carboxy- or amino-terminus of a polypeptide. Accordingly,cleavage of a polypeptide with a sequence specific protease indicatesthe presence of the corresponding amino acid and/or sequence in thepolypeptide or peptide fragment thereof. Similarly, a reagent can beused to specifically modify or label one or more specific amino acidresidues of a polypeptide or fragment. A polypeptide or fragment thatcontains such a modification or label will be known to contain aspecific amino acid. Partial amino acid composition is a characteristicassociated with a polypeptide that can be useful for constraining apolypeptide identification index to generate a polypeptideidentification subindex.

The comparison of a set of determined characteristics with a polypeptideidentification index can therefore involve a series of searchesconstrained by a determined characteristic of a polypeptide. Forexample, an initial search of parent polypeptide or fragment mass can beperformed, resulting in the generation of a polypeptide identificationsubindex containing polypeptide and fragment mass values that aresimilar to, that is, within the range of instrument error, thepolypeptide or fragment thereof to be identified. A secondcharacteristic to be searched against the generated polypeptideidentification subindex, such as the presence of a cysteine residue inthe polypeptide to be identified, provides a further constraint and canbe used to generate a further polypeptide identification subindex.

The determined mass of a polypeptide or fragment is a characteristicthat can be advantageously used to constrain such a database search toincrease the efficiency of searching a large database. For example,tandem MS spectra can be analyzed using software such as SEQUEST, whichgenerates a list of peptides in a database that match the molecular massof the unknown peptide on which CID was carried out and then comparedthe observed CID spectrum of the unknown with that for all possibleisobars (Eng, J. et al., J. Am. Soc. Mass. Spectrom. 5:976-989, (1994)).Therefore, the set of peptides having a molecular mass similar to thepolypeptide fragment being analyzed generated by this type of searchprovides a subset of possible parent polypeptides represented by thepolypeptide fragment. The subset can then be searched using, forexample, a partial amino acid composition, to identify the parentpolypeptide. Those skilled in the art will know or can readily determineappropriate correlation score parameters for a particular search usingsoftware applications such as SEQUEST.

The method of comparing two or more polypeptide populations employs amethod for quantitatively distinguishing the two polypeptidepopulations, such as the method described herein using an ICAT™ typereagent and is illustrated in FIG. 3. Two or several chemicallyidentical but differentially isotopically labeled ICAT™ type reagentscan be used at this step. Therefore, although FIG. 3 depicts twosamples, multiple samples can be compared using the methods describedherein. The samples depicted in FIG. 3 contain polypeptide populationsharvested from the same sample type that differ from each other ingrowth condition. Exemplary differential growth conditions can includegrowth under different metabolic conditions or cells at differentmetabolic states, comparison of a normal and disease sample such as atumor sample, comparison of untreated versus cells treated with apharmacological agent, and the like.

As shown in FIG. 3, the samples are independently labeled using an ICAT™type reagent, combined, and characteristics of the polypeptides andcorresponding fragments are determined, as described herein.Polypeptides and fragments generated during this process can be analyzedusing single stage mass spectrometry, rather than by MS/MS, if desired,to increase sample throughput and sensitivity (Goodlett et al., supra,2000). The characteristics determined for polypeptides and fragments areused to determine polypeptide identities, as described herein.Subsequently, the mass spectra can be examined for pairs of peptide ionsthat co-fractionated throughout the process and that have a massdifference that precisely corresponds to the mass difference encoded inthe ICAT™ type reagent. The relative signal intensities of the two peaksindicate the relative abundance of the fragment polypeptides andtherefore indicate the relative abundance of the corresponding parentpolypeptide initially present in the sample. Therefore, a method forcomparing the polypeptides contained in two polypeptide samples caninvolve the generation of two reference polypeptide indices thatcontain, for each polypeptide identified, a quantitative determinationof polypeptide amount in addition to a polypeptide identification code.

An alternative method for comparing two or more polypeptide populationsis the comparison of one or more polypeptide samples to a previouslydetermined polypeptide reference index. A set of characteristics of oneof more polypeptides in a polypeptide sample can be identified andcompared to a reference polypeptide identification index to determinethe identities of one or more polypeptides and comparative quantities ofthe identified polypeptides. If desired, an unknown sample can becompared to a reference sample using the above-described quantitativemethods to determine relative expression levels of the polypeptides. Areference sample can be, for example, a sample from a healthy individualor a sample from a control condition useful for comparing to thephysiological state of another sample such as a disease sample.

A polypeptide identification index that contains quantitativedeterminations of polypeptide amount is considered to be a “polypeptideprofile” of the particular sample used to generate the index. Apolypeptide profile, as used herein, is a set of polypeptideidentification codes that includes polypeptide amount, generated for aspecific sample.

A polypeptide profile is useful in methods of proteomics because such aprofile can be used to distinguish between different conditions orstates of cells, tissues, organs, and organisms. The polypeptidesexpressed by a cell or tissue at a particular time can be used to definethe state of the cell or tissue at the time of measurement. Therefore,quantitative and qualitative differences between the polypeptideprofiles of the same cell type in different states can be used todiagnose the respective states. Examples for such comparisons includenormal versus tumor cells, cells at different metabolic states anduntreated cell versus cells treated with specific pharmacologicalagents. The differences between two polypeptide profiles can bedescribed as a “differential polypeptide profile”.

A differential polypeptide profile is useful for analyzing quantitativechanges in the polypeptides contained in samples derived from differentcell types such as, for example, cancerous and normal cells, stimulatedand unstimulated cells, or from different tissue samples of clinicalinterest.

The methods of the invention for generating differential polypeptideprofiles are applicable to the analysis of changes in the polypeptideprofiles in samples such as body fluids. A differential polypeptideprofile is determined by comparing the polypeptide profile of twospecimens, for example, a normal to disease-related polypeptide profile.For example, a polypeptide profile representative of a normal specimenstate can be generated and compared to a specimen suspected to be in anabnormal or disease state. Alternatively, a reference polypeptideprofile representative of a disease state can be compared with aspecimen from an individual having or suspected of having a particulardisease state. A reference polypeptide profile representative of anormal or disease state can be determined using a specimen from aparticular individual or a population of individuals.

If desired, analysis can be performed on a population rather than anindividual, particularly a reference population or control population.Such a reference population can be used for comparison of an unknownsample. One skilled in the art can determine an appropriate referencepopulation based on the particular application of the methods of theinvention. The methods of the invention can be used to generate adifferential polypeptide profile that identifies the differences inpolypeptide expression between two samples, for example, a normal anddisease state. The size of the reference population depends on thecriteria used to select reference individuals. Depending on theselection criteria and particular application of the methods of theinvention, a reference population can be a relatively small number to alarge number of individuals, including thousands of individuals.

The large-scale analysis of samples from patients having specificallydiagnosed diseases or exhibiting signs or symptoms of a disease isuseful for identifying clinical markers or constellations of markers forthe respective conditions. Samples from an individual having a diseasecan be used to generate a qualitative and/or quantitative polypeptideidentification index for that disease. Similarly, the comparativeanalysis of polypeptides contained in samples from patients undergoingtherapeutic treatment can be used to identify diagnostic markers orconstellation of markers indicating the success or failure of thetreatment. The methods are also applicable to the analysis of suchsamples on a systematic, population-wide scale for the discovery orscreening of markers or constellations of markers useful for indicatingthe predisposition of individuals for certain clinical conditions.

The invention further provides a method for generating a polypeptideidentification index. The method includes steps of (a) determining a setof two or more characteristics associated with a first polypeptide, or apeptide fragment thereof, one of the characteristics being the mass of apeptide fragment of the polypeptide, the peptide fragment mass beingdetermined by mass spectrometry; (b) repeating step (a) for a secondpolypeptide; (c) optionally determining one or more additionalcharacteristics associated with the first and second polypeptides,wherein the determined characteristics are sufficient to distinguish thefirst and second polypeptides, thereby generating a polypeptideidentification index for the first and second polypeptides. The methodcan further comprise repeating steps (a) through (c) one or more timesfor a different polypeptide, wherein the determined characteristics aresufficient to distinguish each of the polypeptides, thereby generating apolypeptide identification index for each of the polypeptides. As withdetermining characteristics of a polypeptide, the polypeptideidentification index can be determined with any of the methods disclosedherein for determining characteristics associated with a polypeptide.The polypeptide identification index can optionally be obtained bydetermining mass in the absence of ion selection.

The methods of the invention for generating a polypeptide identificationindex involve the determination of polypeptide or fragment mass in theabsence of ion selection for producing fragment ions and can furtherinvolve the determination of a fragment mass at an accuracy of greaterthan 1 part per million, or even lower accuracy (higher ppm), ifdesired. Although a polypeptide identification index can contain apolypeptide amino acid sequence, it is not required that a polypeptideor fragment be sequenced for practicing the methods of the invention forgenerating a polypeptide identification index.

The invention further provides a method for identifying a polypeptide.The method includes steps of (a) simultaneously determining the mass ofa subset of parent polypeptides from a population of polypeptides andthe mass of peptide fragments of the subset of parent polypeptides; (b)comparing the determined masses to a polypeptide identification index;and (c) identifying one or more polypeptides of the polypeptideidentification index having the determined masses. The method canfurther comprise the steps of (d) determining one or more additionalcharacteristics associated with one or more of the parent polypeptides;(e) comparing the characteristics determined in step (a) and step (d) tothe polypeptide identification index; and (f) optionally repeating steps(d) and (e) one or more times, wherein a set of characteristics isdetermined that identifies a parent polypeptide as a single polypeptidein the polypeptide identification index.

The method of the invention for identifying a polypeptide includes astep of simultaneously determining the mass of a subset of parentpolypeptides from a population of polypeptides and the mass ofpolypeptide fragments of the subset of parent polypeptides (see ExampleII). The simultaneous determination of masses of a subset of parentpolypeptides refers to the acquisition of a subset of parent polypeptidemass values from a single sample containing a polypeptide population.The term “simultaneous” is intended to mean that the masses of parentpolypeptides and polypeptide fragments are determined concurrently suchthat the MS method used can acquire masses of parent polypeptides andcorresponding fragments in a time frame sufficient that parent andfragment masses can be correlated to the same subset of polypeptides.For example, the polypeptides being sampled in a MS method will changeover time as different subsets of polypeptides elute from achromatographic column as dictated by the flow rate of the column. Asimultaneous determination occurs during a time period before aparticular subset of polypeptides is altered due to the introduction ofan additional polypeptide or loss of a polypeptide of the polypeptidesubset that occurs as a result of on-line sampling methods.

Simultaneous determination of the mass of a subset of polypeptides canbe performed, for example, in the absence of selection of a single ionfor mass determination. For example, several polypeptides can beselected rather than a single ion (Masselon et al., Anal. Chem.72:1918-1924 (2000)). In methods of the invention, preferably greaterthan 5 ions, for example, 6 ions, 7 ions, 8 ions, 9 ions, 10 ions, oreven greater numbers of ions are selected. In such a case, thepolypeptide identification index is preferably an annotated polypeptideindex.

Alternatively, simultaneous determination of masses of a subset ofpolypeptides can be performed in the absence of single ion selection orin the absence of ion selection in a source region (see FIG. 2). In sucha case, the fragment ions obtained are deconvoluted to determine whichions are associated with a particular parent polypeptide and thereforeuseful as a characteristic associated with the parent polypeptide. Sucha method can be useful for detecting and identifying less abundant ionsthat are not selected for fragmentation in standard MS methods.

A mass spectrometry method useful for obtaining polypeptide andpolypeptide fragment masses simultaneously is in-source CID on ESI-TOF.The method involves continuously alternating between parent ion andin-source CID scans. In-source CID scans provide specific fragment ionstraceable to a given parent ion even in the presence of multiplyfragmented parent ions. Parent-fragment ion lineages can be determinedby deconvolution of mass spectrometry data using appropriate software.MS instruments providing lower accuracy measurements, for example,ESI-TOF, can be used advantageously for providing unique constraints forpolypeptide identification.

The methods of the invention involve determining characteristicsassociated with a polypeptide. A sample containing a polypeptide can beas simple as an isolated polypeptide mixture containing a polypeptide oras complex as sample containing all of the polypeptides expressed in anorganism. Furthermore, a sample can be fractionated, if desired, usingthe methods disclosed herein.

A polypeptide can be in a sample isolated from a variety of sources. Forexample, a polypeptide sample can be prepared from any biological fluid,cell, tissue, organ or portion thereof, or any species of organism. Asample can be present in an individual and obtained or derived from theindividual. For example, a sample can be a histologic section of aspecimen obtained by biopsy, or cells that are placed in or adapted totissue culture. A sample further can be a subcellular fraction orextract. A sample can be prepared by methods known in the art suitablefor maintaining polypeptide solubility, such as those described herein.

A specimen refers specifically to a sample obtained from an individual.A specimen can be obtained from an individual as a fluid or tissuespecimen. For example, a tissue specimen can be obtained as a biopsysuch as a skin biopsy, tissue biopsy or tumor biopsy. A fluid specimencan be blood, serum, urine, saliva, cerebrospinal fluid or other bodilyfluids. A fluid specimen is particularly useful in methods of theinvention since fluid specimens are readily obtained from an individual.Methods for collection of specimens are well known to those skilled inthe art (see, for example, Young and Bermes, in Tietz Textbook ofClinical Chemistry, 3rd ed., Burtis and Ashwood, eds., W.B. Saunders,Philadelphia, Chapter 2, pp. 42-72 (1999)).

A polypeptide to be used in the methods of the invention can be obtainedfrom a source such as a cell, tissue, organ or organism. A variety ofmethods are known in the art for lysing a cell. Cells can be lysed, forexample, by denaturants, one or more cycles of freezing and thawing, andsonication. Following lysis, the polypeptide mixture can be subjected toa fractionation to remove, for example, nucleic acid or lipid, or toremove intact subcellular fractions or organelles. Methods of lysing andfractionating cells are well known to those skilled in the art (seeScopes, supra, 1993).

For identification of a polypeptide, a sample or specimen can becontained in a buffer suitable for maintaining polypeptide solubilitysuch as, for example, a buffer containing a detergent, includingdenaturants such as sodium dodecyl sulfate (SDS). Denaturants useful forsolubilizing polypeptides include, for example, guanidine-HCl,guanidine-isothiocyanate and urea. In the case ofguanidine-isothiocyanate, as with treatment with any reagent that cancovalently modify a polypeptide, such reagents can be used so long asthe polypeptide identification index to which the sample is to becompared has been prepared in substantially the same manner as thesample sufficient for comparison of the same polypeptide. Otherdenaturants well known in the art can be similarly used for solubilizingpolypeptides. In addition, reducing agents such as dithiothreitol (DTT),dithioerythritol (DTE), or mercaptoethanol can be included.

The methods of the invention can optionally involve proteinfractionation steps. Protein fractionation refers to any method usefulfor removing one or more polypeptides from a polypeptide population.Fractionation can include, for example, a centrifugation step thatseparates soluble from insoluble components, a method ofelectrophoresis, and a method of chromatography, or any of the methodsdisclosed. For chromatographic separation, a wide variety ofchromatographic media well known in the art can be used to separatepolypeptide populations. For example, polypeptides can be separatedbased on size, charge, hydrophobicity, binding to particular dyes andother moieties associated with chromatographic media. Size exclusion,gel filtration and gel permeation resins are useful for polypeptideseparation based on size. Examples of chromatographic media forcharge-based separation are strong and weak anion exchange and strongand weak cation exchange resins. Hydrophobic or reverse phasechromatography can also be used.

Affinity chromatography can also be used including, for example,dye-binding resins such as Cibacron blue, substrate analogs, includinganalogs of cofactors such as ATP, WAD, and the like, ligands, specificantibodies, either polyclonal or monoclonal, and the like. An exemplaryaffinity resin includes affinity resins that bind to specific moietiesthat can be incorporated into a polypeptide such as an avidin resin thatbinds to a biotin tag on a polypeptide, as disclosed herein. Theresolution and capacity of particular chromatographic media are known inthe art and can be determined by those skilled in the art. Theusefulness of a particular chromatographic separation for a particularapplication can similarly be assessed by those skilled in the art.

Those of skill in the art will be able to determine the appropriatechromatography conditions for a particular sample size or compositionand will know how to obtain reproducible results for chromatographicseparations under defined buffer, column dimension, and flow rateconditions. All protein fractionation methods can optionally include theuse of an internal standard for assessing the reproducibility of aparticular chromatographic application. Appropriate internal standardswill vary depending on the chromatographic medium. Those skilled in theart will be able to determine an internal standard applicable to amethod of chromatography.

Polypeptide tagging is useful in the methods of the invention forreducing polypeptide sample complexity, providing a database searchconstraint, and enabling quantitative polypeptide comparisons. Thecomplexity of a polypeptide sample can be reduced by tagging apolypeptide with an affinity tag that can be used for isolating asubpopulation of polypeptides that contain the tag. For example, apopulation of polypeptides and fragments can be labeled on a relativelyrare amino acid, such as cysteine, and a subpopulation of polypeptidesand fragments containing the tag can be isolated. The subpopulation ofpolypeptides and fragments isolated in this manner will thus contain aknown amino acid. As described herein, a known amino acid constitutes apartial amino acid composition which is useful for constraining adatabase search. Quantitative polypeptide comparisons can be performedby differentially tagging two polypeptides or polypeptide populations.An ICAT™ type affinity reagent, described in more detail below, isparticularly useful for this purpose, although any other method ofpolypeptide tagging can be similarly applied to polypeptide comparisons.

Polypeptide tagging can be performed using a variety of methods known inthe art. A reagent for polypeptide tagging or modification can containvarious components that are separated by linker regions. Components of apolypeptide tagging reagent can include a reactive group that modifies aspecific chemical group of a polypeptide, a moiety that can be detected,such as by mass spectrometry, and an affinity tag to be used forpolypeptide isolation. Two examples of polypeptide tagging reagents,ICAT™ type and IDEnT, are described in detail below, although any typeof polypeptide tag can be used, if desired.

The methods of the invention for quantitatively comparing twopolypeptide populations involve the use of the isotope-coded affinitytag (ICAT™) method (Gygi et al., Nature Biotechnol. 17:994-999 (1999)which is incorporated herein by reference). An ICAT™ type reagent canadditionally be useful for polypeptide tagging applications that do notinvolve quantitative comparisons. The ICAT™ type reagent method uses anaffinity tag that can be differentially labeled with an isotope that isreadily distinguished using mass spectrometry, for example, hydrogen anddeuterium. The ICAT™ type affinity reagent consists of three elements,an affinity tag, a linker and a reactive group.

One element of the ICAT™ type affinity reagent is an affinity tag thatallows isolation of peptides coupled to the affinity reagent by bindingto a cognate binding partner of the affinity tag. A particularly usefulaffinity tag is biotin, which binds with high affinity to its cognatebinding partner avidin, or related molecules such as streptavidin, andis therefore stable to further biochemical manipulations. Any affinitytag can be used so long as it provides sufficient binding affinity toits cognate binding partner to allow isolation of peptides coupled tothe ICAT™ type affinity reagent. An affinity tag can also be used toisolate a tagged peptide with magnetic beads or other magnetic formatsuitable to isolate a magnetic affinity tag. In the ICAT™ type reagentmethod, or any other method of affinity tagging a peptide, the use ofcovalent trapping can be used to bind the tagged peptides to a solidsupport, if desired.

A second element of the ICAT™ type affinity reagent is a linker that canincorporate a stable isotope. The linker has a sufficient length toallow the reactive group to bind to a specimen polypeptide and theaffinity tag to bind to its cognate binding partner. The linker also hasan appropriate composition to allow incorporation of a stable isotope atone or more atoms. A particularly useful stable isotope pair is hydrogenand deuterium, which can be readily distinguished using massspectrometry as light and heavy forms, respectively. Any of a number ofisotopic atoms can be incorporated into the linker so long as the heavyand light forms can be distinguished using mass spectrometry. Exemplarylinkers include the 4,7,10-trioxa-1,13-tridecanediamine based linker andits related deuterated form,2,2′,3,3′,11,11″,12,12′-octadeutero-4,7,10-trioxa-1,13-tridecanediamine,described by Gygi et al. (supra, 1999). One skilled in the art canreadily determine any of a number of appropriate linkers useful in anICAT™ type affinity reagent that satisfy the above-described criteria.

The third element of the ICAT™ type affinity reagent is a reactivegroup, which can be covalently coupled to a polypeptide in a specimen.Methods for modifying side chain amino acids in polypeptides are wellknown to those skilled in the art (see, for example, Glazer et al.,Laboratory Techniques in Biochemistry and Molecular Biology: ChemicalModification of Proteins, Chapter 3, pp. 68-120, Elsevier BiomedicalPress, New York (1975); Pierce Catalog (1994), Pierce, Rockford Ill.).Any of a variety of reactive groups can be incorporated into an ICAT™type affinity reagent so long as the reactive group can be covalentlycoupled to a polypeptide. For example, a polypeptide can be coupled tothe ICAT™ type affinity reagent via a sulfhydryl reactive group, whichcan react with free sulfhydryls of cysteine or reduced cystines in apolypeptide. An exemplary sulfhydryl reactive group includes aniodoacetamido group, as described in Gygi et al. (supra, 1999). Otherexemplary sulfhydryl reactive groups include maleimides, alkyl and arylhalides, α-haloacyls and pyridyl disulfides. If desired, thepolypeptides can be reduced prior to reacting with an ICAT™ typeaffinity reagent, which is particularly useful when the ICAT™ typeaffinity reagent contains a sulfhydryl reactive group.

A reactive group can also react with amines such as Lys, for example,imidoesters and N-hydroxysuccinimidyl esters. A reactive group can alsoreact with carboxyl groups found in Asp or Glu, or the reactive groupcan react with other amino acids such as His, Tyr, Arg, and Met. Areactive group can also react with a phosphate group for selectivelabeling of phosphopeptides, or with other covalently modified peptides,including glycopeptides, lipopeptides, or any of the covalentpolypeptide modifications disclosed herein. One skilled in the art canreadily determine conditions for modifying specimen molecules by usingvarious reagents, incubation conditions and time of incubation to obtainconditions optimal for modification of specimen molecule for use inmethods of the invention.

The ICAT™ type reagent method is based on derivatizing a specimenmolecule such as a polypeptide with an ICAT™ type affinity reagent. Acontrol reference specimen and a specimen from an individual to betested are differentially labeled with the light and heavy forms of theICAT™ type affinity reagent. The derivatized specimens are combined, andthe derivatized molecules cleaved to generate fragments. For example, apolypeptide molecule can be enzymatically cleaved with one or moreproteases into peptide fragments. Exemplary proteases useful forcleaving polypeptides include trypsin, chymotrypsin, pepsin, papain,Staphylococcus aureus (V8) protease, and the like. Polypeptides can alsobe cleaved chemically, for example, using CNBr, acid or other chemicalreagents.

Once cleaved into fragments, the tagged fragments derivatized with theICAT™ type affinity reagent are isolated via the affinity tag, forexample, biotinylated fragments can be isolated by binding to avidin ina solid phase or chromatographic format. If desired, the isolated,tagged fragments can be further fractionated using one or morealternative separation techniques, including ion exchange, reversephase, size exclusion affinity chromatography and the like, orelectrophoretic methods, including isoelectric focusing. For example,the isolated, tagged fragments can be fractionated by high performanceliquid chromatography (HPLC), including microcapillary HPLC.

The fragments are analyzed using mass spectrometry (MS). Because thespecimen molecules are differentially labeled with light and heavyaffinity tags, the peptide fragments can be distinguished on MS,allowing a side-by-side comparison of the relative amounts of eachpeptide fragment from the control reference and test specimens. Ifdesired, MS can also be used to sequence the corresponding labeledpeptides, allowing identification of molecules corresponding to thetagged peptide fragments.

An advantage of the ICAT™ type reagent method is that the pair ofpeptides tagged with light and heavy ICAT™ type reagents are chemicallyidentical and therefore serve as mutual internal standards for accuratequantification (Gygi et al., supra, 1999). Using MS, the ratios betweenthe intensities of the lower and upper mass components of pairs ofheavy- and light-tagged fragments provides an accurate measure of therelative abundance of the peptide fragments. Thus, the ICAT™ typereagent method can be conveniently used to identify differentiallyexpressed polypeptides, if desired.

An IDEnT reagent can be used to modify a polypeptide by introducing anisotopic tag at a specific protein functional group. An exemplary IDEnTreagent is described in Goodlett et al., supra, 2000. An IDEnT reagentcontains at least one element with an isotopic distribution that createsa unique signature in a mass spectrometer. For example, an IDEnT reagentcan contain chlorine, deuterium, or another element, including aradioactive element. An IDEnT reagent can be designed to bind to a lowabundance amino acid in a polypeptide, such as cysteine. The labeling ofa polypeptide with an IDEnT tag can be applied to the methods of theinvention by providing a constraint for searching a polypeptideidentification index with polypeptide fragment masses.

Protein cleavage or fragmentation is useful in the methods of theinvention for providing a constraint for database searching. Polypeptidefragmentation can be sequence-specific or non-specific.Sequence-specific polypeptide cleavage provides the advantage ofobtaining polypeptide fragments that contain known amino acids which canbe used to constrain a database search. Examples of reagents useful forperforming non-specific polypeptide cleavage are papain, pepsin andprotease Sg. These proteases can be used to achieve a desired degree ofprotein fragmentation, such as, for example, the generation of about twoto four polypeptide fragments from a polypeptide by altering thereaction conditions. Conditions for using these proteases are well knownin the art. Examples of reagents useful for performing sequence-specificpolypeptide cleavage are trypsin, V-8 protease, o-iodosobenzoic acid,cyanogen bromide and acid.

The invention also provides a polypeptide identification index foridentifying a polypeptide from a population of polypeptides. The indexcomprises an annotated set of characteristics associated withpolypeptides in the index, one of the characteristics being the mass ofa fragment of the polypeptide, the fragment mass being determined bymass spectrometry in the absence of ion selection for producing fragmentions. The characteristics are sufficient to distinguish one of thepolypeptides from other polypeptides in the index. A polypeptideidentification index can comprise characteristics for 2 or more, 3 ormore, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 200 ormore, 500 or more, 1000 or more, 2000 or more, 5000 or more, or even10,000 or more polypeptides. A polypeptide identification index can alsoinclude substantially all of the polypeptides in a sample. For example,a polypeptide identification index can include substantially all of thepolypeptides expressed in a genome, such as a viral, bacterial, plant,or animal genome, including a mammalian genome such as human, non-humanprimates, mouse, rat, bovine, goat, rabbit, or other mammalian species.The number of polypeptides in a polypeptide identification index willdepend on the needs of the user and will vary depending on the source ofthe sample to be used to identify polypeptides and the complexity ofpolypeptide expression in the sample.

The polypeptide identification index can be directed to a whole organismor to particular tissues or cells in an organism or to specificsubcellular fractions, for example, organelles, as desired. Accordingly,similar to the reduction in complexity applied to a sample to be tested,a polypeptide identification index directed to a particular target suchas an organism, tissue, cell or subcellular fraction, can be useful forsimplifying a search for identification of a particular polypeptide in aparticular application. For example, in a particular diagnosticapplication where expression of a particular polypeptide or group ofpolypeptides, or the amount of expression of the polypeptides, iscorrelated with a particular condition such as a disease condition, theuse of a polypeptide identification index directed to a relevant targetcan be used. For example, if a group of nuclear proteins are known to beoverexpressed in a cancer cell, the use of a polypeptide identificationindex directed to nuclear proteins can be used to test foroverexpression of the nuclear proteins in a sample from an individualusing the quantitative methods disclosed herein. Moreover, thegeneration of a targeted polypeptide identification index and comparisonto a relevant disease sample can be used to identify aberrantlyexpressed polypeptides, which in turn can be used in diagnosticapplications, as disclosed herein.

The invention additionally provides a polypeptide identification indexcomprising an annotated set of characteristics associated withpolypeptides of the index comprising two or more characteristicsassociated with polypeptides of the index, or a fragment thereof, one ofthe characteristics being the mass of a fragment of the polypeptide, thefragment mass being determined by mass spectrometry in the absence ofion selection for producing fragment ions, and wherein the mass isdetermined at an accuracy in ppm of greater than 1 ppm.

If desired, a polypeptide identification index can be convenientlystored on a computer readable medium. Accordingly, the inventionprovides a computer readable medium comprising an invention polypeptideidentification index, for example, an annotated polypeptide index. Sucha computer readable medium comprising a polypeptide identification indexis useful for comparing the characteristics of a polypeptide with thepolypeptide identification index, which can be conveniently performed ona computer apparatus. The use of a computer apparatus is convenientsince a polypeptide identification index can be conveniently stored andaccessed for comparison to characteristics and/or quantitative amountsof a polypeptide in a sample. A polypeptide identification index can beconveniently accessed using appropriate hardware, software, and/ornetworking, for example, using hardware interfaced with networks,including the internet.

By using various hardware, software and network combinations, themethods of the invention including the step of comparing thecharacteristics determined for a polypeptide to a polypeptideidentification index can be conveniently performed in a variety ofconfigurations. Accordingly, the invention additionally provides acomputer apparatus for carrying out computer executable stepscorresponding to steps of invention methods. For example, a singlecomputer apparatus can contain instructions for carrying out thecomputer executable step(s) of comparing characteristics determined forpolypeptide to a polypeptide identification index, a polypeptideidentification index, and instructions for determining whether thecharacteristics determined for the polypeptide correspond to one or morepolypeptides in the polypeptide identification index.

Alternatively, the computer apparatus can contain instructions forcarrying out the steps of an invention method while the polypeptideidentification index is stored on a separate medium. In addition,instructions for determining whether a polypeptide corresponds to one ormore polypeptides in the polypeptide identification index can becontained on a separate computer apparatus or separate medium, orcombined with the computer apparatus containing the computer executablesteps of the method and/or the database on a separate medium. Such aseparate computer readable medium can be another computer apparatus, astorage medium such as a floppy disk, Zip disk or a server such as afile-server, which can be accessed by a carrier wave such as anelectromagnetic carrier wave. Thus, a computer apparatus containing apolypeptide identification index or a file-server on which thepolypeptide identification index is stored can be remotely accessed viaa network such as the internet. One skilled in the art will know or canreadily determine appropriate hardware, software or network interfacesthat allow interconnection of an invention computer apparatus.

It is understood that modifications which do not substantially affectthe activity of the various embodiments of this invention are alsoincluded within the definition of the invention provided herein.Accordingly, the following examples are intended to illustrate but notlimit the present invention.

Example I Generation of an Annotated Polypeptide Index

This example describes the generation of an annotated polypeptide indexand use of the annotated polypeptide index to identify a polypeptide ina sample.

The elements of an annotated polypeptide (AP) index, also referred to asan annotated peptide tag (APT) index or database, are the sequences ofessentially all the peptides or selected peptides with specificstructural features that are generated by sequence specific chemical orenzymatic fragmentation of the proteins produced by the species, cell ortissue under investigation. Each peptide is annotated with attributes,or characteristics, that are easily determined experimentally and thatpermit the unambiguous correlation between the annotated peptide and theprotein from which the peptide originated.

The generation of an exemplary AP index can involve the followingspecific steps: harvest proteins; label proteins with an isotope codedaffinity tag (ICAT™) type reagent; fractionate proteins by molecularweight; digest proteins with a protease, for example, trypsin, togenerate peptides; separate peptides by chromatography, for example, ionexchange chromatography; purify each ion exchange fraction by affinitychromatography, for example, based on the ICAT™ type affinity tag;analyze each affinity chromatography fraction by LC/MS/MS or CE/MS/MS;identify essentially all expressed proteins via a database search ofindividual MS/MS peptide spectra; and generate a database of annotatedpeptide tags that constitute a unique bar code for an individual peptidebased on measured physicochemical properties and thus the parent proteinof that peptide.

The AP index can be generated as follows: (i) protein samplepreparation; (ii) protein tagging; (iii) optional protein fractionation;(iv) protein fragmentation; (v) peptide separation; (vi) affinityisolation of tagged peptides; (vii) high resolution peptide separation;(viii) database searching; (ix) AP index (APT database) construction.

(i) Protein Sample Preparation. Protein samples for which quantitativeproteome analysis is to be performed, for example, cells, tissues,subcellular fractions, body fluids, cellular secretions, and the like,are isolated from the respective sources using standard protocols formaintaining the solubility of all the proteins.

(ii) Protein tagging. The proteins in the sample are completelydenatured, reduced, and the all the sulfhydryl groups representing theside chains of reduced cysteine residues are covalently derivatized withthe light or heavy form, respectively, of sulfhydryl-specific ICAT™ typereagents using the conditions described previously (Gygi et al., NatureBiotechnol. 17:994-999 (1999)) (see FIG. 3. While cysteine tagging is aparticularly useful implementation of the method, any other chemicalreaction with specificity for a chemical group in the protein can alsobe applied.

(iii) Optional Protein Fractionation. The mixture of tagged proteins isfractionated using any one of the known standard protein separationprocedures. The applied procedure is reproducible, maintains theproteins in solution, and has a high sample capacity. A particularlyuseful method is preparative sodium dodecyl sulfate-polyacrylamide gelelectrophoresis (SDS-PAGE).

(iv) Protein Fragmentation. The proteins in the sample mixture, or theproteins contained in each fraction if optional sample fractionation isemployed, are subjected to sequence specific cleavage. A preferredmethod is tryptic cleavage.

(v) Peptide Separation. The resulting peptide mixtures are subjected toa first dimension peptide separation. The peptide separation method hasa high sample capacity, at least moderate resolving power, and generateshighly reproducible separation patters, irrespective of the complexityof the sample applied. A particularly useful first dimension separationmethod is cation ion exchange chromatography.

(vi) Affinity Isolation of Tagged Peptides. Peptides tagged with theICAT™ type reagent, i.e., cysteine containing peptides, are isolatedfrom each chromatographic fraction using avidin or streptavidin affinitychromatography. A particularly useful affinity medium is monomericavidin immobilized on polymer beads. If ICAT™ type reagents withaffinity tags different from biotin are used, affinity mediacomplementary to that tag are used.

(vii) High Resolution Peptide Separation. A particularly useful methodfor high resolution peptide separation is liquid chromatographyESI-MS/MS. The peptide mixtures eluted from the affinity chromatographycolumns are individually analyzed by automated LC-MS/MS using capillaryreversed phase chromatography as the separation method (Yates et al.,Methods Mol. Biol. 112:553-569 (1999)) and, data dependent CID withdynamic exclusion (Goodlett et al., Anal. Chem. 15:1112-1118 (2000)) asthe mass spectrometric method.

(viii) Database Searching. The sequence of all the peptides for whichsuitable CID spectra are obtained is determined by searching a sequencedatabase from the species under investigation. A particularly usefulsequence database is a database containing all the complete proteinsequences that can be potentially expressed by the species underexamination. The sequence database search program is the Sequest program(Eng, J. et al., J. Am. Soc. Mass. Spectrom. 5:976-989, (1994)) or aprogram with similar capabilities.

(ix) AP index (APT database) Construction. The sequences of all thepeptides that have been identified by the procedure described above areentered in a database and annotated with the characteristics, orattributes, that were generated during steps (i)-(viii) above. Thesecharacteristics, or attributes, include, but are not limited to: partialamino acid composition (such as the presence of a cysteine residue ineach selected peptide; see Goodlett et al., supra, 2000); theapproximate molecular mass of the parent protein (as determined by theoptional SDS-PAGE fractionation); the order of elution or elution timefrom the cation ion exchange column; and the elution time from thereversed-phase column. Collectively, these attributes are unique forevery peptide in the database akin to a bar code for each peptide.Therefore, if the same bar code is being determined from the peptidesgenerated in subsequent experiments, they will uniquely identify thepeptides generated by the experiment, simply by correlating the barcodes generated by the experiment with the bar codes present in the APindex (APT database).

For correlation of polypeptides with the AP index (APT database), thepeptide samples generated for quantitative proteome analysis by themethod described above are generated, treated and processed preciselylike the peptides generated for the AP index (APT database), with thefollowing exceptions. (i) The proteins in the two (or more) samples tobe compared are labeled with differentially isotopic labeled ICAT™ typereagents. Two or several chemically identical but differentiallyisotopically labeled ICAT™ type reagents can be used at this step. (ii)The generated peptides are analyzed by single stage mass spectrometryonly, rather than by MS/MS. Mass analyzers will generally have a highmass accuracy, high sensitivity and high mass resolution. Instrumentswith these characteristics include, but are not limited to, MALDI-TOFmass spectrometers, ESI-TOF mass spectrometers and Fourier transform ioncyclotron mass analyzers (FT-ICR-MS). The attributes determined fromeach peptide by this process (all the attributes described above or aselection thereof) are translated into a bar code for each peptide, andthe experimentally determined bar code is correlated with the bar codesfrom the AP index (APT database), resulting in the unambiguousidentification of the peptide and therefore the protein from which thepeptide originated. Subsequently, the mass spectra are examined forpairs of peptide ions that co-fractionated throughout the process andthat have a mass difference that precisely corresponds to the massdifference encoded in the ICAT™ type reagent used. The relative signalintensities of the two peaks indicate the relative abundance of thepeptides and therefore the relative abundance of the correspondingproteins initially present in the sample. Consequently, the correlationof the experimentally determined data with the AP index (APT database)allows quantification and identification of the proteins in the samplesanalyzed.

Example II Generation of a Yeast Annotated Polypeptide Index

This example describes the generation of an annotated polypeptide indexfor yeast.

At least 5 mg of total protein was estimated to be required at currentmass spectrometer sensitivity to detect low abundance proteins using theLC/LC/MS/MS method (Gygi et al., Proc. Natl. Acad. Sci. USA 97:9390-9395(2000)), and this amount was essentially experimentally confirmed. Gygiet al., supra, 2000 also demonstrated that the “binning” process isadequate for the detection of low abundance proteins and has sufficientsample capacity to accommodate the relatively large amounts of totalsample.

For the construction of the database, the following procedure is used.For protein labeling, a protein sample is generated in 0.5% SDS, 50 mMTris, pH 8.3, 5 mM ethylenediaminetetraacetic acid (EDTA) at a proteinconcentration of 5 mg/ml. A total of 25 mg of total yeast protein isused. Once proteins are in solution, the SDS concentration is lowered bydiluting the sample 1:10 with water and adding EDTA to maintain a 5 mMEDTA concentration. The final concentration is 0.05% SDS, 5 mM Tris, 5mM EDTA. The sample is then boiled for 3-5 min at 100° C. and thenchilled. Reduction of disulfide bonds is accomplished by addingsufficient Tributylphosphine (TBP) to achieve 5 mM in the samplesolution. This is followed by an incubation of the sample at 37° C. for30 min. To the reduced sample, the alkylating reagent (e.g. ICAT™ typereagent) is added at an estimated 5× molar excess over the SH groupspresent in the sample. The alkylation reaction is allowed to proceed indarkness for 90 min.

For protein separation, the reduced and alkylated sample is added with0.2 volume of 5×SDS gel sample buffer and boiled for 5 min. The cooledsample is then applied to a preparative SDS gel with the dimensions 20cm×20 cm×1.5 mm. After electrophoresis, the gel is sliced perpendicularto the electrophoresis dimension into 10 strips of equal size. Thesestrips represent 10 size bins for the intact proteins. The proteins inthe gel strips are then subjected to in-gel digestion using standardprotocols.

For peptide separation, the peptides that are extracted from the gelslices are subjected to three sequential chromatographic separations.First, they are separated by cation exchange chromatography. Second, thebiotinylated peptides are isolated by avidin chromatography. Third,peptides are further fractionated by capillary reverse-phasechromatography.

For cation exchange chromatography, a cation exchange HPLC column isused (PolyLC Inc., 2.1 mm×20 cm, 5 (m particles, 300 Å pore size,Polysulfoethyl A strong cation exchange material). The following puffersare used: Buffer A, 10 mM KH₂PO₄, 25% CH₃CN, pH 3.0; Buffer B, 10 mMKH₂PO₄, 25% CH₃CN, 350 mM KCl, pH 3.0. The following gradient is run:

Time (min) % B 0 0 30 25 50 100

The flow rate is 200 μL per minute. Fractions are collected at 1-2minute intervals. Anywhere from about 200 microgram up to about 5 mg ofdigested, ICAT™ labeled total protein is loaded on this column, usuallyusing a 2 mL sample loop. It is important to acidify the samples down topH 3.0 or below before loading onto the cation exchange column becausepeptides will not be fully charged at higher pH values and can possiblynot stick to the column. The gradient shown is designed to spread outthe elution of doubly-charged peptides as much as possible, with thesepeptides usually eluting starting at about 8-9 minutes into the rununtil approximately minutes 15-16, after which triply charged peptidesbegin to elute. 30 fractions are collected over the duration of thegradient.

For avidin affinity chromatography, an Ultralink Monomeric Avidin(Pierce, cat #53146) is used. A small piece of glass wool is packed intothe neck of a glass pipette tube. 400 μl of avidin chromatographymaterial is packed into the tube (slurry comes at 50% dilution, so 800μl of 50% slurry is added in order to get 400 μl avidin chromatographymaterial). The beads are allowed to settle, and the column is washedwith 2×PBS to bring the beads down off the side of the tube. The packedcolumn is washed through with 30% Acetonitrile (ACN) with 0.4%Trifluoroacetic acid (TEA) until the flow-through pH changes to ˜1, andthen another column volume of the ACN/TFA is washed through. This acidicstep is to get rid of polymers associated with the beads. The column iswashed with 2×PBS pH 7.2 until the pH is ˜7.2. Thereafter, the column iswashed through with 3 more column volumes (1200 μl) of same buffer.

The column is washed with 3-4 column volumes of biotin blocking buffer(2 mM d-biotin in PBS). This biotin blocks the more retentive avidinsites on the column, ensuring recovery of the sample from the remainingbinding sites later on.

Loosely bound biotin is washed off with ˜6 column volumes (2,400 μl) ofregeneration buffer (100 mM glycine, pH 2.8), until the flow-through pHchanges to ˜2.8. This glycine solution is sterilized by autoclavingbefore use and stored at 4° C., where it will last one week.

The column is washed with 6 column volumes of 2×PBS to return column toproper pH (˜7.2). The flow-through pH is monitored. The peptide samplesconsisting of individual or pooled ion exchange column fractions arethen applied to the column and incubated in column for ˜20 min. Unboundmaterial is then washed from the column by applying 5 column volumes of2×PBS, pH 7.2, and fractions are collected. The column is further washedthrough with 5× column volumes of 1×PBS (this step is to reduce the saltconcentration), and 6 column volumes of 50 mM AMBIC, pH 8.3, with 20%methanol (MeOH) while continuing to collect fractions (50 mM AMBIC is tobring the salt concentration down; MeOH is to get rid of hydrophobicpeptides). Biotinylated peptides are eluted with elution buffer (30%ACN/0.4% TFA) and collected manually in a glass tube. These samples arefurther separated by capillary reverse-phase chromatography.

For reverse-phase chromatography, purified biotinylated peptides areseparated by reverse-phase capillary chromatography using standardprotocols. The solvent gradient is chosen so that the peptides eluteover 60 min. If 1 min fractions are collected and analyzed in the massspectrometer, 60 bins are created by this procedure.

For mass analysis and sequencing, biotinylated peptides eluting from theRP-columns are analyzed by ESI-MS/MS for the generation of the databaseand by ESI-MS for database searching. For the database construction, anion trap mass spectrometer is used with a mass accuracy of approximately1 mass unit. For the mass measurement for database searching, an ESI-TOFmass spectrometer is used with a mass accuracy exceeding 50 ppm and aresolution exceeding 10.000. That means that a peptide at 1000 massunits can be distinguished from a peptide at 1000.1 mass units. If amass range for tryptic peptides is assumed to be between 500 and 3000mass units, a mass spectrometer at that performance would generate25.000 bins.

In the case of proteome analysis of the yeast S. cerevisiae, the genomeof this organism contains approximately 6200 ORF's. The yeast proteometherefore is expected to be approximately 6200 different proteins,disregarding differentially modified forms of the same protein. Trypticdigestion of a yeast proteome would yield approximately 350,000 peptidesif empirically derived specificity rules for trypsin are applied. Thissample complexity is reduced to approximately 35,000 peptides if onlythe cysteine-containing peptides are extracted, based on the chemicalderivatization with the ICAT™ type reagents. The total number of binsavailable from the procedure described above is 10×30×60×25.000=4.5×10⁸and therefore by far exceeds the number of peptides expected from atotal yeast proteome analysis. It is therefore expected that for asample of the complexity of a yeast extract, the procedure for thegeneration of database search data can be simplified. As an example, thegel electrophoresis sizing step for proteins can be optionallyeliminated.

Neither the procedure for the generation of the data to be entered intothe database nor the procedure for the generation of data to search thedatabase are fixed. Therefore, for optimization, depending on the degreeof sample complexity, the number of bins available can be easilyadjusted. Generally, the number of bins chosen for the generation of thedatabase is high, whereas the number of bins for generating the databasesearch data would be chosen as low as possible to maximize the samplethroughput. The number of bins available can be easily adapted invarious ways. Firstly, the inclusion of additional orthogonal separationdimensions can be considered for proteins or peptides. For proteinseparation, isoelectric focusing, ion exchange chromatography,hydroxylapatite chromatography, or similar electrophoretic orchromatographic techniques can be included. For peptide separation,separation based on peptide size or capillary electrophoresis methodscan be included.

Secondly, the separation range for the separation methods describedabove can be extended. Protein sizing can be extended by using gradientgels or longer gels with extended separation range. For thechromatographic peptide separation methods, the number of bins can beeasily expanded by generating extended, shallower gradients and/or bysampling more frequently. Finally the number of bins is criticallydependent on the resolution and mass accuracy of the mass analyzer used.Adding a mass analyzer with higher performance will decrease the numberof bins provided by the separation methods employed in the procedure.

Example III Use of ESI-TOF for MS Analysis of Complex Samples

This example describes a method for determining a polypeptideidentification subindex using the mass of polypeptide fragmentsdetermined by ESI-TOF.

The masses of a set of polypeptides, or fragments thereof, weredetermined using ESI-TOF in order to constrain a polypeptideidentification index by mass values to generate a polypeptideidentification subindex for use with complex genomes. Two peptides,ASHLAGAR (P1) and RPPGFSPR (P2), were infused together into an ESI-TOF.The mixture was intended to simulate co-elution of peptides during HPLCseparation. Spectra were acquired at low (FIG. 4A) and high (FIG. 46)V_(Nozzle-Skimmer).

As can be seen in FIG. 4B there are numerous fragment ions present forboth peptides. No fragment ions appeared above the P1 1+ion and so thism/z range was omitted from FIG. 4B. An algorithm was written todeconvolute the mixture of parent and fragment ions in a single massspectrum (FIG. 4B). P1 and P2 sequences were placed into a list of allpossible tryptic peptides from the database of 60884 human polypeptides.Comparison of P1 and P2 observed masses (FIG. 4A) to all possibletryptic peptides from the 60884 polypeptides, produced a list of 57 and124 isobars, respectively. The list of b- and y-ion fragments from allisobars calculated to 10 ppm was then compared to the observed fragmentions between 500 and 825 m/z in FIG. 4B. This process produced a list of12 and 13 possible polypeptide identifications for P1 and P2,respectively, as shown in FIG. 4C. This method is useful for applying aconstraint to a polypeptide identification index to generate apolypeptide identification subindex for use with complex genomes.In-source CID on peptides can be acquired easily and quickly in acontinuously alternating fashion by ESI-TOF and potentially by in-sourcedecomposition in MALDI-TOF. Furthermore, the method does not result inloss of data as occurs with tandem MS where, during CID of a selectedion, co-eluting ions are necessarily omitted from analysis. If thepeptides are of low abundance (i.e. low signal-to-noise) then they willnot be selected for CID by data dependent (DD) tandem MS processes. Thisis because standard DD processes examine first the base peak (i.e., mostintense ion per m/z window), carries out CID on it, dynamically excludesit from further consideration, and proceeds to the next most abundantion.

The peptides shown in FIG. 4C represent a reduction in complexity fromover 60,000 possible polypeptides to about 12 or 13 polypeptides.Additional characteristics are determined, for example, atomic mass,amino acid composition, partial amino acid sequence, apparent molecularweight, pI, and order of elution on specific chromatographic media underdefined conditions, and the like, for parent polypeptides or one or moreadditional peptide fragments thereof. Methods for fractionatingpolypeptides have been previously described (see, for example, Scopes,Protein Purification: Principles and Practice, 3rd ed., Springer Verlag,New York (1993)). A sufficient number of characteristics are determinedso that a single polypeptide in the polypeptide identification index isidentified.

Throughout this application various publications have been referenced.The disclosures of these publications in their entireties are herebyincorporated by reference in this application in order to more fullydescribe the state of the art to which this invention pertains.

Although the invention has been described with reference to thedisclosed embodiments, those skilled in the art will readily appreciatethat the specific experiments detailed are only illustrative of theinvention. It should be understood that various modifications can bemade without departing from the spirit of the invention. Accordingly,the invention is limited only by the following claims.

What is claimed is:
 1. A method for identifying a polypeptide,comprising: (a) simultaneously determining the mass of a subset ofparent polypeptides from a population of polypeptides and the mass offragments of said subset of parent polypeptides; (b) comparing saiddetermined masses to a annotated polypeptide index; and (c) identifyingone or more polypeptides of said annotated polypeptide index having saiddetermined masses.
 2. The method of claim 1, further comprising: (d)determining one or more additional characteristics associated with oneor more of said parent polypeptides; (e) comparing said characteristicsdetermined in step (a) and step (d) to said annotated polypeptide index;and (f) optionally repeating steps (d) and (e) one or more times,wherein a set of characteristics is determined that identifies a parentpolypeptide as a single polypeptide in said annotated polypeptide index.3. The method of claim 1, further comprising quantitating the amount ofsaid identified polypeptide in a sample containing said polypeptide. 4.The method of claim 2, wherein a set of characteristics is determinedthat identifies two or more parent polypeptides as single polypeptidesin said annotated polypeptide index.
 5. The method of claim 4, wherein aset of characteristics is determined that identifies each of said parentpolypeptides in said subset of parent polypeptides.
 6. The method ofclaim 1, wherein said fragment mass is determined by mass spectrometryin the absence of ion selection for producing fragment ions.
 7. Themethod of claim 1, wherein said fragment mass is determined at anaccuracy in ppm of greater than 1 ppm.
 8. The method of claim 1, whereinsaid fragment mass is determined at an accuracy in ppm of 2.5 ppm orgreater ppm.
 9. The method of claim 1, wherein said fragment mass isdetermined at an accuracy in ppm of 5 ppm or greater ppm.
 10. The methodof claim 1, wherein said fragment mass is determined at an accuracy inppm of 10 ppm or greater ppm.
 11. The method of claim 1, wherein saidfragment mass is determined at an accuracy in ppm of 100 ppm or greaterppm.
 12. The method of claim 13, wherein said characteristics areselected from the group consisting of polypeptide mass, amino acidcomposition, pI, and order of elution on a chromatographic medium.
 13. Amethod for identifying a polypeptide, comprising: (a) simultaneouslydetermining the mass of a subset of parent polypeptides from apopulation of polypeptides and the mass of fragments of said subset ofparent polypeptides; (b) comparing said determined masses to anannotated polypeptide index; (c) identifying one or more polypeptides ofsaid annotated polypeptide index having said determined masses; and (d)quantitating the amount of said identified polypeptide in a samplecontaining said polypeptide.
 14. The method of claim 13, furthercomprising: (e) determining one or more additional characteristicsassociated with one or more of said parent polypeptides; (f) comparingsaid characteristics determined in step (a) and step (e) to saidannotated polypeptide index; and (g) optionally repeating steps (e) and(f) one or more times, wherein a set of characteristics is determinedthat identifies a parent polypeptide as a single polypeptide in saidannotated polypeptide index.
 15. The method of claim 14, wherein a setof characteristics is determined that identifies two or more parentpolypeptides as single polypeptides in said annotated polypeptide index.16. The method of claim 15, wherein a set of characteristics isdetermined that identifies each of said parent polypeptides in saidsubset of parent polypeptides.
 17. The method of claim 13, wherein saidfragment mass is determined by mass spectrometry in the absence of ionselection for producing fragment ions.
 18. The method of claim 13,wherein said fragment mass is determined at an accuracy in ppm ofgreater than 1 ppm.